WP3 - EOSC data catalogue services for EU Photon and Neutron national RIs

WP3 delivers EOSC data catalogue services for RIs to provide access as a one-stop-shop for scientific facility users, academics and the general public to find and exploit research data.

Lead partner: PSI

Budgeted participation per partner:

PSI Diamond HZB MAX IV DESY UKRI SOLEIL EGI ALBA Elettra
45 PMs 36 PMs 18 PMs 18 PMs 9 PMs 9 PMs 9 PMs 9 PMs 9 PMs 7 PMs

Tasks

Progress in each of the work package's tasks are reported in this section, with regular updates, indicated by the dates. To view the original description of work of each task, as it is written in the project's description of work, you can click on the table header cells. Only the last update of the next steps is displayed.

T3.1 Coordinate metadata catalogue services (PSI)
Task: Perform landscape analysis on the current usage and approaches to metadata catalogues at ALBA, DESY, DLS, ELETTRA, HZB, MAX IV, PSI and SOLEIL. Based on the landscape analysis and requirements defined in WP2 a gap analysis will provide the necessary input to develop a roadmap towards harmonised and federated metadata catalogue services. A workshop will be performed at M6 with WP2 and WP4 to align requirements and specification of services. Coordinating this task and compatibility with PaNOSC and the EOSC-hub will be essential and a key success factor to the three projects (PaNOSC, EOSC-hub and ExPaNDS).
Progress: 23 Apr 2020:
A data landscaping was carried out in November-December 2019 to survey the current usage and approaches to metadata catalogues at our 10 national RIs. Specifically, which data catalogues were used, which file formats, database for metadata, primary software language and the number of public datasets available today were collected. The metadata standards used and how the metadata is collected at the facilities was also surveyed.
A gap analysis will follow in the next months. It needs the requirements to be further discussed with WP2.
15 Jul 2020:
Based on the previously reported survey DOI: 10.5281/zenodo.3673810, the structure and contents of D3.1 has been agreed and is now in preparation.
Collaborators within the ICAT data catalogue (HZB and UKRI from ExPaNDS) have developed a plug-in for OAI-PMH protocol for OpenAire which has helped sites including PaNOSC sites to be harvested, which will give the opportunity to publish datasets in EOSC.
As a result of the drive to publish open data related to CoronaVirus research by ExPaNDS facilities there has been an increased activity at facilities to enable relevant datasets to become public. This activity will ultimately underpin ExPaNDS to make data and services FAIR and available on EOSC. An example of this work is at PSI where a 10TB imaging dataset of a lung sample has been made available. This and other examples will be used as example use cases and pilot studies.
Additionally, several members of WP3 were heavily involved in helping to release version 1.0 of the PaN search API that will offer a standard interface between metadata catalogues and EOSC. This contributed significantly to a PaNOSC deliverable and enables later ExPaNDS WP3 deliverables.
19 Apr 2021:
The roadmap and gap analysis for implementing a FAIR data catalogue and ensuring that data is available for data reprocessing in WP4 was published in early November 2020.
08 Aug 2022:
The same data landscaping survey was circulated on Aug 22 and analysed to drive helping facilities and get a status update
Next steps: 08 Aug 2022: By the end of the project, WP3 will circulate the survey again and collect the latest results.
T3.2 Develop an EU PaN ontology (Diamond)
Task: Develop ontologies for main application domains of Photon and Neutron science to standardise the metadata used in metadata catalogues based on requirements defined in WP2. This will ensure that federated EOSC metadata catalogues are not only based on a common syntax, but also on a common semantics. The ontology itself will be provided as an EOSC service using existing tools to document and make ontologies accessible (e.g NeOn, Knoodle, Protégé, Swoop). Development of the ontology will be closely linked to the existing NeXus file format and its further developments (PSI, ISIS and DLS have leading roles in the specification and implementation of NeXus). Photon and Neutron-related ontologies provided by NeXus will be used and extended. In a similar way existing ontologies (such as those provided by NIST) should be taken into account.
Progress: 23 Apr 2020:
The activities on ontology started at M6. As a starting point, the beamline scientists at Diamond were asked at mid-March to provide their search terms with several guiding use cases. The answers will be analysed early April, the survey improved and then sent to other partner facilities users. The survey can be found here (data miners are welcome to participate). In parallel, the collaboration with WP2 task on metadata standards and NeXus format is well in place with meetings every 2 weeks.
15 Jul 2020:
This task has been by far the most active within WP3, largely driven by the enthusiasm and drive of Silvia da Garcia Ramos from Diamond, which does deserve special recognition. There has also been extensive community engagement which includes:
• Develop and release a first draft for the PaNKOS ontology update with work from HZB.
• Engage with CalipsoPlus project with presentation on Way for light, which already has an extensive list of photon instruments.
• Engage with EOSC. How ExPaNDS services will be deployed in EOSC? A presentation on EOSC onboarding is being organised by WP1. The contact was initially about understanding which ontology repository we should use and how the ontology would be integrated into EOSC as a service.
• Engage with WP2 in particular task 2.3. Participation in the deliverable D2.2
• A significant stakeholder engagement was conducted through a survey on how to search for data in a data catalogue. Analysis has been performed and the results were recently presented in a joint WP3 ExPaNDS and PaNOSC presentation. A full report is also underway.
• Coordination of ontology work with other key collaborators including PaNOSC.
19 Apr 2021:
The structure of the experimental technique ontology was agreed across all ExPaNDS and PaNOSC partners. A first version of the techniques ontology will be published at the end of May 2021, in OWL format. It applies the Ten Simple Rules for making a vocabulary FAIR. Since its delivery will be staged and challenged by implementations and feedback from the community, a reproducible workflow for its deployment was set up.
The second aspect of the ontologies work is related to the NeXus format. A script to convert the base and upper definition classes defined in the NeXus Definition Language format to an OWL format was developed. Once approved by the NIAC, it will be a sustainable tool that benefits the NeXus ontology, providing PIDs hence making it FAIRer.
The existing ontology repository for life science BioPortal will be used to publish our ontologies. Steps were taken by EGI and Diamond to have BioPortal onboard into EOSC.
04 Nov 2021:
The techniques ontology PaNET and the Nexus ontology were both published in early June 2021. They are available in BioPortal and in NeXus GitHub, respectively.
A data manager is being hired at Diamond to replace Sylvie. His/her tasks will include sustaining the ontologies and defining the governance to maintain them more broadly.
08 Aug 2022:
Publishing of PaNET is still under development. There is now a core team of some five individuals and progress can be tracked in issues 13, 21, and 48.
Next steps: 08 Aug 2022: A core team will work on the publication.
T3.3 Implement ontologies in metadata catalogues (PSI)
Task: The defined ontologies will be implemented in different data catalogues (e.g. ICAT at UKRI and SciCat at PSI). To foster the federation of local services a reference implementation will be provided on the basis of the NeXus format. These will result in a European standard with international impact, and as such provides the basis for APIs and interoperability. The latter activities will be aligned with the PaNOSC initiative.
Progress: 19 Apr 2021:
The implementation of the ontologies into SciCat and ICAT metadata catalogues is developing at several facilities (PSI, Diamond and ISIS), covering discussion with instrument scientists, historical modification of data and definition of ingestion method for new data.
There has also been extensive collaboration with PaNOSC for the PaN search API deliverable.
04 Nov 2021:
In SciCat, PaNET is already implemented: the user is suggested with search terms from the techniques ontology. A self-contained SciCat instance was made available for facilities to test and try locally.
The task continues with implementation of PaNET in ICAT. The implementation of the ontologies in both community catalogues will be described in the upcoming deliverable D3.3 planned for February 2022.
08 Aug 2022:
A site-independent implementation of PaNET retrieval is implemented and can be easily configured for use by other facilities. The documentation is available in the same repository and integration with SciCat and the search-api is available.
Next steps: 08 Aug 2022: Coordinate with the other facilities about the use and collect an update on the status of the integration with ICAT.
T3.4 Coordinate metadata catalogues and data life cycle (HZB)
Task: Based on the roadmap defined in task 3.1 the national sites will deploy the developed standards and processes. The WP will help to coordinate these activities by providing best practices and cross-site support. Technical and user interface related information will be documented and provided as input for WP5 to facilitate training. The coordination process will be continued until the end of the project.
Progress: 04 Nov 2021:
This task was kicked-off in early Autumn 2021 and is currently looking to update the assessment of where the partners stand with their metadata catalogues compared to the status made in the frame of task 3.1. In addition to the current status, the task wants to understand the priorities, the road blockers and where assistance is most needed.
08 Aug 2022:
The survey has been circulated and the results helped organise a suitable workshop, mainly discussing how to adopt one of the main data catalogues. Details can be found here. .
T3.5 Integrate metadata catalogue services into EOSC (MAX IV)
Task: The different metadata catalogues will be integrated into the PaNOSC federated metadata catalogue and provided as a service on EOSC-hub. Access to metadata catalogues will be facilitated by Umbrella-ID (federated AAI services for Photon and Neutron RIs).
Progress: 19 Apr 2021:
In parallel to the development of the search aggregator service (PaN search API), the deployment of an OAI-PMH in our data catalogues to make metadata accessible to EOSC harvestors like B2FIND and OpenAire is progressing at several facilities. A workshop was organised at the beginning of April 2021 to share experience and existing plug-ins for SciCat and ICAT across our facilities.
04 Nov 2021:
In addition to PSI, HZDR now also has a functional OAI-PMH interface and is harvested by B2FIND. It is envisaged, in addition to the PaN search API made available in EOSC by PaNOSC, to declare each facility's open data portal (or OAI-PMH interface?) as individual services in the EOSC marketplace.
08 Aug 2022:
An Umbrella workshop by experts in the field took place on 3/05/22. Details can be found here. This provided the audience with the tools to get started with UmbrellaID and an interactive session allowed the hosts to singularly help facilities set up UmbrellaID.
T3.6 Training material (SOLEIL)
Task: These will be used by WP5 to develop training material for facility staff responsible for data lifecycle management and provide training materials for scientific users.
A training plan and material will be developed to ensure training and deployment of solutions, to ensure sustainability of support and operations in the context of EOSC.
Progress: 04 Nov 2021:
The training plan was discussed in the frame of WP5 after the project’s mid-term review and will continue to be carried out in the following months. Training material was already made available in the project’s training platform:
- stand-alone metadata catalogue,
- wiki on OAI-PMH and metadata harvesting,
- mapping DCAT ontology and Nexus,
- NexusOntology,
- PaNET ontology.
08 Aug 2022:
All training material collected during the course of ExPaNDS for WP3 has been collected in the deliverable D3.4 and is available here and on the PaN training platform. WP5 helped set up training workflows in the PaN training portal and in particular the one which covers the metadata catalogue services.

Deliverables

"Accepted" deliverables are the ones approved by the European Commission and thus published in CORDIS with other EU projects results.

"Delivered" deliverables are the ones we submitted to the European Commission but are not yet approved. They can only be found in our Zenodo community for now.

"Pending" deliverables are still in progress and were not yet submitted by the project to the European Commission.

Accepted deliverables Partner Date
3.1 Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs (link) PSI 03 Nov 2020
Delivered deliverables Partner Date
3.2 Release V1.0 ExPaNDS ontology available as an EOSC online service (link) PSI 04 Jun 2021
3.3 Demonstrate ICAT and SciCat released with APIs compatible to ExPaNDS federated EOSC services (link) PSI 16 Mar 2022
3.4 Portfolio material to support training for target groups at all Research Infrastructures (RIs) (link) PSI 11 Apr 2022
Pending deliverables Partner Date
No pending deliverables at the moment

Milestones

Achieved milestones Partner Date
12 Metadata catalogue release (link) PSI 24 Aug 2021
13 Metadata catalogue as EOSC service PSI 28 Feb 2022
Pending milestones Partner Date
No pending milestones at the moment