Shape and volume matching software in structural biology

In February-March 2013, the structural biology work package of BioMedBridges (WP9) conducted a survey on shape/volume matching software used in structural biology. The aim was to discover what kinds of structural comparisons are of interest to structural biologists and to assess the software tools currently in use. The results of the survey will be used in the development of a new service, one of the project deliverables (see WP9 deliverables).

In total, 43 people completed the survey. Most responses were received via the 3DEM mailing list, which serves the electron microscopy community, but other channels were also used. For several of the questions, many respondents gave more than one answer (typically 2-2.5 responses/respondents), indicating that this is a heterogeneous field where there is no single solution that works in all cases.  Apart from questions 2 (5 respondents) and 10 (31 respondents), all questions were answered by at least 35 people (80%).  In all, the number of answers received was good and the level of details provided highly useful. The full survey results can be downloaded here.

 

Detailed responses 

1. Do you currently do any shape fitting?

34/43 yes, 6/43 no but would if a web-service was available. This confirms that most respondents have an existing interest in the subject of the survey.

 

2. If you answered "No" above, please describe what you would be looking for/need in terms of tools, data, web services or other resources.

5 people answered this question.  Two asked for shape matching for individual proteins.   

Comment: Existing servers, such as SSM (http://www.ebi.ac.uk/msd-srv/ssm/) and DaLi (http://ekhidna.biocenter.helsinki.fi/dali_server), can handle shape matching between proteins where atomic coordinates are available.

 

3. What type(s) of shapes do you use?

Volumes and atomistic models both scored 80% or more, then sub-volumes 50%, coarse-grained models 25%, and geometric models including point clouds/spherical harmonics 20%.

 Types of shapes

As expected, most shapes are described as volumes/sub-volumes or via atomic coordinates, but there is a significant minority of other representations.

 

4. What type(s) of volume data do you fit into?

Single-particle EM (75%) was the most commonly used volume data, as expected considering that most responses were from the 3D EM mailing list.  Those that use either electron tomography (ET) or sub-ET all use single particle-EM as well, and often both types of ET.  30% used SAXS and in over half of these cases, SAXS was the only type of volume data.  Other replies included X-ray, CT, MRI and ultrasound.

Types of volume data

5. Which software are you using for shape matching (select all that you use)? 

17 of 19 packages listed got at least one vote.  Chimera was by far the most commonly used software (63%), and “Other” (48%) the second most common.  Situs and MDFF also got over 20% each.  For those that are developing their own software, Matlab and EMAN were the most common framework. 

Software

 

6. Do you use a GUI or a command line script? Do you follow a documented or published protocol, or do you use an in-house protocol?

Both GUI and command line got about 2/3 of the vote, again showing clear overlap between the answers.  There was an almost equal three way split on protocols, 1/3 published/documented protocol, 1/3 in-house protocol, 1/3 each case is different.

 

7. What, if any, meta data does the software need?

All the options, resolution, the two density levels combined, symmetry, and grid sampling scored in the range 55-65%.  Of the two density level options, contour level vs. background scattering the former had more responses, by 14 to 11.

 

8. Any options/feature that is particularly useful with the software that you use? 

“Availability” and “Ease of use” were the most common answers, both scoring 63%.  Other answers were “Scriptability” 43%, “Available features” 31%, and “Developed in-house” 23%.  In particular, a significant minority use in-house software, which suggests that available software does not provide the features or ease of use required.

Two of the eight comments highlighted that they use the software that gives the best result; this contrasts with the relatively large number of respondents who cited availability and ease of use. Of course, these issues overlap. User-friendly software may improve end results by making it easy to explore different settings and optimise parameters.  

 

9. What type of matching do you do?

90% answered rigid body, 60% both for manual and flexible fitting.  The indication is that at least in some cases and with some tools it is necessary/easier to start with a manual fitting. Flexible fitting is relatively popular, and presumably methods are included to prevent over-fitting. 

 

10. If your volume data have symmetry elements, such as in viruses, chaperons, helices etc., how do you handle the symmetry?

About half fit into one position and half into all positions, while a quarter ignore symmetry. There is clearly overlap between the first two, and the strategy may be case dependent. Frequently, fitting is first done into one position and then into all positions.  Urox/Veda were mentioned as having good support for symmetry.

 

11. How do you evaluate the results?

80% use visual inspection, 72% cross-correlation score, 25% each for atom-inclusion and other scoring functions. Thus, while the cross correlation is the most popular numerical score, there is still a heavy reliance on visual inspection. In fact, 25 picked both visual inspection and cross correlation out of 32 visual inspection and 29 cross correlation. It may be that visual inspection is used to distinguish between some of the top scores manually, or that different evaluation methods are used at different points during a fitting procedure.  

 

12. What is your role?

54% answered PI's, and then staff scientists, graduate students, postdocs all in the 10-20% range, with a single government employee and an undergraduate student respondent.

 

Conclusions

The wide range of responses shows that there is diversity in the methods and programs used, probably due to the diversity in the problems being addressed. A significant proportion of respondents use manual fitting and/or visual inspection. Such an approach may be appropriate for individual cases, but is not feasible for the BioMedBridges web service, which must be fully automated. Conversely, an automated service can explore the sensitivity to the metadata provided (e.g. grid sampling), and employ several different fitting scores to increase confidence. It is clear that ease-of-use will be important for a new service, and that it must cover a variety of protocols.