Tools and resources

Simple and versatile biobank sample information federation: MIABIS Connect

MIABIS Connect

Finding biosamples with specific characteristics across different biobanks is challenging because each biobank manages sample information differently. Together with the sensitive nature of large amounts of biosample information, this makes it difficult to pull information together for example in a central place where a researcher can browse and discover groups of samples that might be suitable for a given research project.

Determine the coordination geometry of metals in biological macromolecules

Metals are essential for the structure and function of many proteins and nucleic acids. The geometric arrangement of the atoms that coordinate a metal in a biological macromolecule is an important determinant of the specificity and role of that metal. FindGeo quickly finds and determines the coordination geometry of selected, or all, metals in a given structure.

Supporting researchers sharing sensitive data: identifying requirements

Legal and Ethical Requirements Assessment Tool - LAT

Uncertainty about ethical and legal requirements with respect to sensitive data – such as personally identifiable data – is one of the most important barriers to data sharing. Aimed at researchers with limited or no background or experience with these formal requirements, this tool aims to clarify if and how sensitive data can be shared, or when additional actions or expert advice are needed.

Unlocking biomolecule volume data

Information on the structure of macromolecules is important for biology and medicine because the function of many important biological processes can be directly linked to macromolecular structure or changes therein. For example, structural information on how a drug is bound to a target may yield vital clues on how to optimize its structure to promote more efficient binding. The BioMedBridges shape-matching software - which will be incorporated into a web-based shape-matching service - will enable the wealth of low-resolution volume data in the EMDB and PDB resources to be searched based on their shape, providing entirely new perspectives on the relationships between structures.

Sharing protein engineering knowledge

PiMS

PiMS is a laboratory information management system for use in recombinant protein production laboratories; to manage the stages from the selected target protein to the production of soluble protein. PiMS development is part of a larger vision to provide a unified and extensible set of software tools for structural biology, offering seamless data transfer and a consistent user experience, from target selection to the interpretation of the structure.

Unlocking small-molecule resources

The UniChem connectivity search function allows users to find not only exact matches of their chemical structure across 60 million related molecules from 21 data sources worldwide, but also identifies 'equivalent' structures that have the same atom connectivity while differing in stereochemistry or isotopic composition, or which exist in a different salt form. This functionality is particularly important in the development of new pharmaceuticals, where early candidate triage can save significant amounts of time and money.

Imaging data management and interoperability

Bigr XNAT adds a custom REST interface to the open source platform XNAT, enabling data from medical imaging databases to be embedded into XNAT and vice versa. It creates a user-friendly data bridge for medical imaging trials that supports sharing and analysis of imaging and image-derived data and which supports centralised correlative analysis between image-derived data and other types of data such as clinical data (e.g. disease status, age) and genetic data.

Final version of the web-based shape-matching service

A software pipeline named SMaSB was previously developed to perform the volume/shape matching that underpins the set of tools being developed in WP9. These tools are novel in providing access to a growing class of structural biology data viz. volume data. The SMaSB software is primarily a set of Python codes which organise the metadata, control the data flow during volume/shape matching, and record the results. Third-party software is called to perform the compute-intensive steps in the pipeline. The first full version of SMaSB was released in December 2014, and reported in Deliverable 9.1. We now report on the associated web service (PDBeShape) which provides a user-friendly front-end to SMaSB results. It provides web access to a curated database of structural volumes, derived from deposited structures in the Electron Microscopy Data Bank (EMDB) and Protein Data Bank (PDB), together with pre-processed alignments and scores. We have made the first version of the PDBeShape service publically available at http://wwwdev.ebi.ac.uk/pdbe/emdb/shape/welcome/. Here we give details on the current functionality of PDBeShape and the technical underpinning. More extensive documentation is provided on the web page.

Development of co-annotated mouse-human datasets

In order to develop a comprehensive set of terms to describe Type 2 diabetes and obesity phenotypes in mouse and human, Type 2 Diabetes-related phenotypes were mined from the literature for use as new phenotype terms. The mined terms were curated and temporally categorised by expert clinicians/diabetologists. The terms were represented as an ontology in OWL format and the utility of the ontology in the annotation of data resources and partner data sets was evaluated. Using the ontology developed here enabled the annotation of mouse and human datasets with specific terminology representing Type 2 Diabetes progression, which will ultimately support translational research.

ESFRI BMS Meta Service Registry (eSR)

This prototype service registry was developed with contributions from BioMedBridges partners and in collaboration with ELIXIR. The registry is designed to make it easy for researchers to find, compare, and use biomedical software to address a scientific question or research support task such as “What are all of the Gene Ontology tools? Which of these is most highly cited?”. By returning relevant, structured results, the registry complements search engines like Google: the user can specify exactly what they need, using various search and filter options, and get a tailored list of suitable resources. From sequencing to structures, imaging to indexing, the registry’s domain scope is very broad; it also encompasses webservices, web GUIs, desktop GUIs, and commandline tools. This broad scope ensures coverage of a substantial portion of the tools and data services of use to Research Infrastructures represented in BioMedBridges. Information about tools includes crucial provenance details, links to relevant publications and grants and key contact information. To achieve the objectives, it was necessary to develop a sustainable, scalable minimum metadata model and formal schema to describe software. The model was purpose-built to be lightweight and flat in order to facilitate adoption by other software registries that may be looking to provide or aggregate software metadata in the future. The registry data model, software, and content (metadata describing the tools) are openly available in order to further encourage re-use and community participation.