ENDA BioSAS pipeline using the SCARF web portal


This pipeline speeds up biological small angle scattering (SAS) data analysis; particularly "dummy atom modelling". It was originally developed by Diamond Light Source Ltd., the EMBL Hamburg and the ESRF on the EDNA platform with modules (i.e. Gnom, Dammif, Damaver) found in the ATSAS suite of programs.

1) Obtain a SCARF account and register for the EDNA Biosas pipeline. If you don't have one already, you can apply here: http://www.scarf.rl.ac.uk/registration

Under account usage case you can put :

"I would like access to the RAL SCARF high performance computing cluster to run the EDNA Biosas pipeline on my biological SANS data. Access to SCARF would significantly accelerate my data processing."

 

If you would only want access the EDNA BioSAS pipeline.

 

If you are not a STFC employee your application to SCARF must be supported by a member of staff. Please contact them and ask if they agree to this. Then in the usage case put:

"My application is supported by <an STFC employee>"

 

2) SCARF can be accessed on the web through the following web portal: http://portal.scarf.rl.ac.uk/ either by using your STFC ID or x509 certificate login.

3) You should enter your account directory, as shown below:

From your account you can see your "Jobs" on the left hand side with Submission Forms and Job status.

4) By clicking on Submission Forms, then EDNA BioSAS Pipeline you can access the following template:

5) The following options are available:

i) Job Name - as a folder identifier.

ii) You then need to upload the data. Two main formats are supported:

I) Multicolumn ascii reduced text or *.dat ATSAS data (SAS experimental data is usually 3 column (Q, I and Error), although 2 column (Q, I) model data will run).

AVOID SPACES IN YOURS EXPERIMENTAL FILE AS THIS GENERATE AN ERROR AND STOP THE PIPELINE

II) Data in the Nexus file data format is also acceptable, by specifying th data file, scattering vector and intensity values in NeXuS data file.

Move the data file for processing to your cluster home directory.

iii) Threads (default = 10) - as 10 Dammif model are generated, setting threads over 10 will not cause a significant speed increase.

iv) There are a number of optional values:

I) Rmin and Rmax to define the range of the Gnom search, plus Rmax intervals to set the number of subdivisions and stopping criteria of the Gnom search.

II) You can also define number of columns, qmin and qmax.

III) You can define particle symmetery, the anneal mode of the Dammif model building, and whether you just want to run a Gnom search of the data, and if you want to plot every Gnom fit.

v) There are further Advanced options for the HPC set-up, as shown below, they should not normally need to be altered.

6) Press submit to initiate the job.

7) You can monitor your jobs status, either as pending, running or done. When the job is finished you should see a folder containing a number of files including:

A log file of the process:- EDApplicationSASPipeline_<numbers>-<numbers>.log

A folder containing your data:- EDApplicationSASPipeline_<numbers>-<numbers>

8) You can download the results to your home machine. A summary of the results is given in the pipelineResults.html file. An example is shown below:

Summary of Solution Scattering Pipeline Execution

Data file : /home/isisg/scarf266/lyzexp.dat

GNOM

DAMMIF

DAMAVER

Optimized value of RMax : 44.96

Estimated Rg : 15.01

Output fit quality : 0.971

RFactor : 0.0011

Chi(Sqrt) : 1.268

Mean NSD : 0.810

Variation of NSD : 0.019

 

Rmax search results

Experimental data fitting

Rmax search results

Experimental data fitting


Results of GNOM run


Optimized value of RMax : 44.96

Estimated Rg : 15.01

Output fit quality : 0.971

GNOM output file : Gnomv0_1-optimal/gnom.out

Distribution function


Number of GNOM iterations performed before converging : 3

+ Iteration # 1

+ Iteration # 2

+ Iteration # 3


 

 

 

 

Results of the best DAMMIF run


RFactor : 0.0011 Chi(Sqrt) : 1.268

DAMMIF particle model : Dammifv0_1-00000004/dammif-1.pdb

DAMMIF solvent model : Dammifv0_1-00000004/dammif-0.pdb

DAMMIF fit file : Dammifv0_1-00000004/dammif.fit

DAMMIF log file : Dammifv0_1-00000004/dammif.log


Number of DAMMIF jobs run : 10

RFactor

Chi(Sqrt)

Link

0.0006

1.270

Dammifv0_1

Accepted

0.0006

1.269

Dammifv0_1-00000003

Accepted

0.0011

1.268

Dammifv0_1-00000004

Accepted

0.0009

1.271

Dammifv0_1-00000005

Accepted

0.0007

1.268

Dammifv0_1-00000006

Accepted

0.0009

1.268

Dammifv0_1-00000007

Accepted

0.0010

1.276

Dammifv0_1-00000008

Accepted

0.0010

1.271

Dammifv0_1-00000009

Accepted

0.0009

1.268

Dammifv0_1-00000010

Accepted

0.0009

1.268

Dammifv0_1-00000011

Accepted


Results of model averaging using DAMAVER pipeline

Damaver Average NSDDamaver results


DAMAVER output pdb model : Damaverv0_1/damaver_valid.pdb

DAMFILT output pdb model : Damfiltv0_1/damfilt_valid.pdb

DAMSTART output pdb model : Damstartv0_1/damstart_valid.pdb

 


References

ATSAS A program suite for small-angle scattering data analysis from biological macromolecules. http://www.embl-hamburg.de/biosaxs/software.html

GNOM Svergun D.I. (1992) Determination of the regularization parameter in indirect-transform methods using perceptual criteria. J. Appl. Cryst. 25, 495-503

DAMMIF Franke, D. and Svergun, D.I. (2009) DAMMIF, a program for rapid ab-initio shape determination in small-angle scattering. J. Appl. Cryst. 42, 342-346.

SUPCOMB M.Kozin and D.Svergun (2000) Automated matching of high- and low-resolution structural models J. Appl. Cryst. 34, 33-41.

DAMAVER V. V. Volkov and D. I. Svergun (2003). Uniqueness of ab-initio shape determination in small-angle scattering. J. Appl. Cryst. 36, 860-864.

EDNA M.-F. Incardona, G. P. Bourenkov, K. Levik, R. A. Pieritz, A. N. Popov and O. Svensson (2009). EDNA: a framework for plugin-based applications applied to X-ray experiment online data analysis. J. Synchrotron Rad. 16, 872-879.http://www.edna-site.org/

Jmol an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/

Numpy the package for scientific computing in Python http://numpy.scipy.org/

Matplotlib the package for 2D plotting in Python http://matplotlib.sourceforge.net/

NeXus a common data format for neutron, x-ray, and muon science. http://www.nexusformat.org/

h5py a simple Python interface to HDF5 http://h5py.alfven.org/