Categories
News

Databases Demystified

Ever wondered how many open access and deposition repositories there are out there? Let FAIRsharing help you find out! This post will include various exemplars of how to use the FAIRsharing Advanced and Saved Searches to your advantage when exploring the landscape of research databases across all subject areas. The content will also posted on our Mastodon and Twitter accounts.

Just before FAIRsharing’s summer break, we announced the completion of a MASSIVE curation push from our FAIRsharing Community Champions. This curation, together with our new Advanced and Saved Searches, can help you discover nuanced information regarding databases.

We’re now taking a minute to show you how this fantastic curation can be used to discover interesting facts about the research landscape around databases. Take a look at our examples below and use them to help you create searches specifically for your community. Got questions? Get in touch with us.

Community Collaboration

FAIRsharing collaborates with many different communities, of which the RDA is one of our core collaborators. More information about our alignment with various database-related communities is available on our gitbook pages. We have been directly involved in the following repository-related working groups:

  1. Co-chairing the recently-completed RDA Data Repository Attributes WG. FAIRsharing implements the resulting RDA recommendations, currently RDA DRA WG’s RDA Common Descriptive Attributes of Research Data Repositories version 1.0. We have 100% alignment with these common attributes, details of which are listed below.
  2. Co-chairing the currently-active Community-based catalogue of requirements for trustworthy Technical Repository Service Providers Working Group. This working group has among its expected outcomes the determination, through stakeholder consultation, priorities for criteria implementation and identify metadata that can be associated with the implementation of each criteria to facilitate modular, decentralized certification.
  3. The FAIRsharing WG, whose current role as a maintenance working group is to encourage collaboration with other RDA groups as relates to its goals.

The searches you see in the rest of this blog post highlight a number of the different common attributes identified by the RDA as described above.

By registering your resource with FAIRsharing, you are exposing community-endorsed, FAIR-enabling attributes of your repository to humans and machines as part of the larger graph of FAIRsharing resource descriptions.

Saving Searches

Our documentation provides clear instructions as to how to use our Advanced Search. If you are logged in, then there is a now a button on your search results page that will allow you to save the search:

From this, you can choose to save your search and view or run it at any time via your user profile:

You can create saved searches for whatever topic interests you, and run those searches at regular intervals to get an view of the research landscape you’re interested in and how it changes over time.

Example 1: Open access and deposition repositories

In this first example, we’ll show you how to build a search that will find all repositories registered with FAIRsharing that have open deposition conditions AND open or partially open access conditions.

You can click here to see the search that I ran, or take a look at my user profile and find the appropriate search. Once you have your own search results, you can build on this Advanced Search, modifying it as required and saving it to your own profile.

In this example, I’ve found 467 repositories that fulfil these conditions out of the 1304 repositories or knowledgebase/repositories within our 2232-strong database registry. Does that seem small to you? Would you think that more than 35% of these resources should be open for both access and deposition? What about your favourite repository? Have a look in FAIRsharing and see what it says. Let us know what you find, or if you think we need to update a record.

Example 2: More stringent requirements

Let’s be a bit more selective in our database record search now, and look at records that contain clear information around a number of database best practices. We’ll take a mixture of fields that touch upon FAIR and the recommendations by the Data Repository Attributes WG at the RDA, which is aligned with FAIRsharing metadata. We query FAIRsharing to discover database records with a ‘Ready’ status that have open or partially open data access; they also need to have preservation and sustainability statements (whatever the levels of preservation and sustainability that might apply); finally, they should implement a persistent identifier. While it may seem like a lot, it is really just part of the minimal information that should be provided by database owners.

What’s interesting is that only 40 records fulfil every single one of these criteria; you can see who they are by clicking here to see the search that I ran, or take a look at my user profile and find the appropriate search.

Is your database (or one you use regularly) on this list? If not, do you think it belongs there? Get in touch with us if you have information about a database that would get it into these search results!

Why do you think there are so few that fulfil all of these requirements? Here are some ideas we have come up with.

  • Information not publicly available. Over time, as databases work to implement guidance (such as that provided by the RDA above) around minimal information describing databases, this number will grow. As a result, everyone will be better informed regarding those databases, and ultimately able to make better decisions about which ones to use.
  • Unclear information. Our curation team and Champions can only include information that they can find, and sometimes it can be hard to find the information we need on resource websites.
  • Mistakes. If we are missing information in your database record or have gotten something wrong, then you can do two things: see if you can represent that information more clearly on your site, and please get in touch with us to let us know where we’ve missed something.

Many of these issues can be resolved by claiming your records on FAIRsharing and helping us keep them up to date. Its good for your resource, but also good for you (find out how to claim a record and why it’s a good idea).

Example 3: Types of Curation

When considering which database is suitable for your particular needs, one interesting consideration is the type of curation that databases employs. This advanced search discovers how many databases have each of the different types of curation FAIRsharing describes: manual, automated, both manual and automated, none, or not found.

In this example, we’ve looked at only Ready or In-development records. From this the majority are listed as manually curated (633), although for nearly that number (546) our curators and Champions could not determine the level of curation from the available database documentation. This is highly relevant information, and it is very interesting that it is so hard to discover. 380 databases were curated as having aspects of both manual and automated curation, while 125 only documented automated curation procedures. Finally, 143 databases had no additional curation.

Example 4: Sustainability and Preservation

In our final example, we see how many database records in FAIRsharing have either Sustainability or Preservation plans. The answer is that only 8% of database records in FAIRsharing describe either one. I’m using this example as it’s a tricky area; these are vitally important parts of database documentation, but are often missing or difficult to find: our curation staff and Community Champions have commented on it. Historically, database ‘churn’ has been a real issue for research databases that are often funded purely from grants. Currently 437 out of the 2252 database records in FAIRsharing are listed as deprecated. Having these deprecated records in FAIRsharing is vital, as we aim to show the entirety of the database life cycle for our records; it’s just as important to know a database is still active as it is to know that it is no longer available.

So understanding how a database you are considering using has thought about sustainability and data preservation is an important part of the process of choosing a place to put your data. And it’s easier for databases to provide this information than you might think. These two fields in a record are deliberately not making a judgement about the level of sustainability or preservation; they simply state whether or not such information exists, and can sometimes point you to the part of the database website that describes these plans in more detail. Take a look at my sustainability and preservation searches, and feel free to adapt the searches to find out new things!

There’s one more reason I chose this as our final example; these statements are easily overlooked. It was only while running this example that I realised that FAIRsharing had a preservation plan but did not yet have a sustainability statement. You can now find both here, and remember to check the record of your favourite database, and use the information we have (or don’t have) in these records as a way to evaluate the documentation for your database; are there ways to make it easier to find or understand? Is there anything missing? We can all make small changes to make our resource descriptions more, well, descriptive!

Thank you to our team and our FAIRsharing Community Champions for curating the database registry and to our technical team for creating the Advanced and Saved Searches to provide us with this information.