This week we announce that the FAIRsharing developer and curation teams have carried out an important cleaning and updating of organisations in FAIRsharing. Organisations are essential elements of the FAIRsharing data model representing academic or non-academic entities that support, fund or maintain one or more of our records (databases, policies and standards). Users can also be connected to the organisation to which they belong. Here are three examples of the more than 3,200 existing FAIRsharing organisations: European Commission, IBM and Oxford e-Research Centre.
The image below shows records related to National Cancer Institute (NCI), National Institutes of Health (NIH):
FAIRsharing organisations now have a field called ROR ID that indicates the link of this organisation to an external database called Research Organization Registry Community (ROR):
ROR contains more than 100,000 top-level entities and is community maintained. The curation team has mapped FAIRsharing’s to ROR’s organisation records using a semi-automatic method:
- Firstly, the most similar organisations comparing the name and URLs (homepage of the organisation) are obtained using string comparisons (edit distance is used to compare each one of the fields).
- Curators then manually checked if there is a match between these similar organisations. In cases where there was not a match they also checked the whole ROR database looking for keywords.
The task was complex as the name of the same organisation can be represented with different word sequences, e.g. using acronyms, different word order, using foreign language words, including organisation location… Also, many times the URLs change during the time or more than one valid URL exists for the same organisation.
After this task was over, 55% of the FAIRsharing organisations were mapped with the ROR organisations. The main reason for the existence of a large number of unmatched organisations is that the granularity in FAIRsharing is different from that in ROR, e.g. there are academic groups or departments that do not exist in ROR but exist in FAIRsharing.
As new organisations are constantly being created in FAIRsharing and ROR, regular updates to find new matches will be made.
During this process curators discovered that there are FAIRsharing organisations with other fields to be updated, repeated organisations or organisations that can be directly removed. As a result, 265 organisations have been deleted or merged and 1747 organisations had one or more of their fields updated.