Please tell us about your experience of this website today

UK dataset expertise informs Google's new dataset search

6 September 2018

Image of Europe

False colour image of Europe captured by Sentinel 3.
(Credit: contains modified Copernicus Sentinel data (2018))

Experts from UK Research and Innovation have contributed to a search tool newly launched by Google that aims to help scientists, policy makers and other user groups more easily find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

In today's world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. As part of the UK Research and Innovation commitment to easy access to data, their experts worked with Google to help develop the Dataset Search, launched today.

Similar to how Google Scholar works, Dataset Search lets users find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.

Google approached UK Research and Innovation’s Natural Environment Research Council (NERC) and Science and Technology Facilities Council (STFC) to help ensure their world-leading environmental datasets were included. The heritage in these organisations for managing huge complex datasets on the atmosphere, oceans, climate change, and even data about the solar system, managed by Dr Sarah Callaghan, the Data and Programme Manager at the UKRI’s national space laboratory STFC RAL Space, led to them working with Google on the project.

Dr Sarah Callaghan said: “In RAL Space we manage, archive and distribute thousands of terabytes of data to make it available to scientific researchers and other interested parties. My experience making datasets findable, usable and interoperable enabled me to advise Google on their Dataset Search and how to best display their search results.”

“I was able to draw on my work with NERC and STFC datasets, not only in just archiving and managing data for the long term and the scientific record, but also helping users to understand if a dataset is the right one for their purposes.”

Temperature of Europe

Temperature of Europe during the April 2018 heatwave.
(Credit: contains modified Copernicus Sentinel data (2018))

To create Dataset Search, Google developed guidelines for dataset providers to describe their data in a way that search engines can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. This enables search engines to collect and link this information, analyse where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. The approach is based on an open standard for describing this information (schema.org). Many STFC and NERC datasets for environmental data are already described in this way and are particularly good examples of findable, user-friendly datasets.  

“Standardised ways of describing data allows us to help researchers by building tools and services to make it easier to find and use data” said Dr Callaghan, “If people don’t know what datasets exist, they won’t know how to look for what they need to solve their environmental problems. For example, an ecologist might not know where to go to find, or how to access the rainfall data needed to understand a changing habitat. Making data easier to find, will help introduce researchers from a variety of disciplines to the vast amount of data I and my colleagues manage for NERC and STFC.”

The new Google Dataset Search offers references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organisations.

Professor Tim Wheeler, Director of Research and Innovation at NERC, said: “NERC is constantly working to raise awareness of the wealth of environmental information held within its Data Centres, and to improve access to it. This new tool will make it easier than ever for the public, business and science professionals to find and access the data that they’re looking for. We want to get as many people as possible interested in and able to benefit from data collected by the environmental science that we fund.”

NERC/STFC JASMIN computer

NERC/STFC JASMIN computer.
(Credit: STFC)

Dr Chris Mutlow, Director of STFC RAL Space said, “This work builds on RAL Space experience in data management and commitment to making it easily accessible. The expertise that Sarah and our other data scientists have in this area is becoming an ever more important global resource to call upon. The data centres we manage for NERC and STFC play an important role in scientific research and are a facility available to all.”

Notes

NERC and STFC are non-departmental public bodies and are part of UK Research and Innovation. UKRI brings together nine Councils into a single organisation that aims to ensure the UK maintains its world-leading position in research and innovation. For more information visit the UKRI website.

STFC RAL Space - RAL Space is an integral part of the Science and Technology Facilities Council's (STFC) Rutherford Appleton Laboratory (RAL). RAL Space carries out world-class space research and technology development with involvement in over 210 space missions.
RAL Space Twitter

The Natural Environment Research Council (NERC) is the UK's main agency for funding and managing research, training and knowledge exchange in the environmental sciences. Our work covers the full range of atmospheric, Earth, biological, terrestrial and aquatic science, from the deep oceans to the upper atmosphere and from the poles to the equator. We coordinate some of the world's most exciting research projects, tackling major issues such as climate change, environmental influences on human health, the genetic make-up of life on Earth, and much more.
NERC Twitter

Science and Technology Facilities Council
Switchboard: +44 (0)1793 442000