Data Shared is Scientific Research Halved

5 October 2020

Collaboration is a touchstone of modern scientific research. STFC uses open data projects that allow researchers to work together to increase the circulation and exploitation of knowledge.

What is open data?

Open data is the concept that output from research should be readily available for other scientists, industry workers and the general public to use. It falls within the Open Science initiative, the umbrella movement making scientific research accessible to all. The idea behind open data is to reduce the time spent on redundant research projects and increase the time spent on new projects, meaning data shared is scientific research halved.

According to the European Open Science Cloud (EOSC), open data could allow a collaboration between 1.7 million European researchers and 70 million professionals in science, technology, the humanities and social sciences. Horizon 2020, an EU Research and Innovation programme, is funding initiatives with the Scientific Computing Department (SCD) and ISIS Muon and Neutron Source at STFC, as well as Diamond Light Source and other organisations, to develop open data systems.


How many people do you think there are between you and Kevin Bacon?

You might have heard of the Six Degrees of Kevin Bacon game that states people are up to six social connections away from each other. What does this have to do with open data? The FREYA project named in acknowledgment of the Norse origin of preceding projects and coordinated by STFC is developing the concept of a PID Graph that uses a data network model similar to this social network idea.

PIDs, or persistent identifiers, are ways of linking digital objects important in research, such as datasets, publications and even the people carrying out the research. They are like URLs on the World Wide Web but should never have broken links - that is why they are called "persistent".

The PID Graph uses the metadata (the descriptive part of the data - such as the author) of each PID to provide connections with other PIDs that are up to two “hops” away. So, if this was a social network, there would only be two degrees of separation, or two Kevin Bacons, between two “people” in an open data system. By making connections like this, the whole research process becomes more transparent and easier to understand.

Did you know? The technology used for PID Graph is also used for the social graphs behind Facebook. Video on YouTube.


What makes data fair? Data from science should be Findable, Accessible, Interoperable and Re-usable (FAIR). This means that:

  • Data is findable if it can be located using metadata, such as, the title of a research paper.
  • Again, using metadata, data is accessible if always available and obtainable from a trusted source.
  • The language used in the metadata is of an interoperable standard if it allows data exchange and reuse between researchers, institutions and countries.
  • Data is re-usable if it has clear usage licenses and accurate information on its origin.

STFC is participating in a project called FAIRsFAIR that aims to provide training and guidelines to scientists on how to ensure FAIR data practices. These principles ensure that all users of FAIR data can agree, fair's fair.


Did you know: A single experiment at ISIS can produce the equivalent of 14 hours of TV streaming content?

At STFC's Rutherford Appleton Laboratory (RAL), facilities like ISIS and nearby Diamond Light Source produce petabytes of data from experiments carried out on their beamlines. The EOSC Photon and Neutron Data Service (ExPaNDS) project aims to help manage the huge amount of data produced and provide analysis services. At RAL, SCD provides support for the data produced throughout the experimental lifecycle at ISIS and Diamond.

ExPaNDS also draws on FAIR data principles and will be looking at adopting these into the community to improve access to the data generated by national photon and neutron facilities in the UK and across Europe. 

The aim is to build a reliable, open innovation environment for scientific research where data from publicly funded research is 'as open as possible, as closed as necessary.' The end goal is to open up all parts of the research process, even the algorithms and code.


Further information on the projects is available from the project websites.

