cd

Community Engagement

NFDI4DS has multiple efforts to engage with the community. We are currently focusing on:

ST SOMD

SOMD - Software Mention Detection in Scholarly Publications

2026 edition

Hosted as part of the NSLP 2026 Workshop at LREC 2026 at 12 May 2026 in Palma de Mallorca, Spain

The rise of climate discourse on social media offers new channels for public engagement but also amplifies mis- and disinformation. As online platforms increasingly shape public understanding of science, tools that ground claims in trustworthy, peer-reviewed evidence are necessary. The new 2026 iteration of ClimateCheck builds on the results and insights from the 2025 iteration (run at SDP 2025/ACL 2025), extending it by adding training data, a new task on classifying disinformation narratives in climate discourse, and a focus on sustainable solutions.

Find detailed information on the Workshop page: https://nfdi4ds.github.io/nslp2026/docs/climatecheck_shared_task.html

2025 edition

NFDI4DS partners organize Shared Tasks at SDP2025 co-located with ACL2025.

Data-driven scientific processes strongly rely on the use of software to collect and prepare data and to generate insights via automated analysis. Hence, tracking the provenance of software artifacts is becoming an essential aspect of transparency and reproducibility. Additionally, aggregated observations of software citations can help to measure their usage and impact in the long run. While the referencing of scientific articles is handled according to well-established patterns, the citation practices of code bases and software programs are less coherent.

Therefore, we invite participants of our shared task to develop robust supervised information extraction models that facilitate the disambiguation of software mentions and relevant metadata in scholarly publications. The task utilizes the Software Mentions in Science - SoMeSci knowledge graph of software mentions (Schindler et al., 2022). As a novelty presented with this task, SoMeSci will be extended to include more publications in the fields of Artificial Intelligence (AI) and Computer Science.

Subtasks

  • Subtask 1: Software Mention Detection
  • Subtask 2: Additional Information Detection
  • Subtask 3: Relation Extraction
  • Subtask 4: Disambiguation
  • Subtask 5: End to End

Datasets SoMeSci is a knowledge graph of software mentions including 399,942 triples to date. It describes 3,756 software mentions, including type information and extensive metadata, from 1,367 PubMed Central articles. The dataset will be expanded to include Computer Science publications following the SomeSci schema.

Metrics We will evaluate method performance using traditional IR metrics (P/R/F1) on specific subtasks, such as 1) detection of software mentions and types, 2) detection of related attributes (e.g. version, developer, etc), and 3) disambiguation of detected mentions. While the final set of tasks will still be announced, details for a more exhaustive set of tasks and affiliated baselines can be found here.

Contact Persons

  • Frank Krüger (HS Wismar)
  • Stefan Dietze (GESIS)
  • Saurav Karmakar (GESIS)
  • Danilo Dessi (GESIS)
  • Jennifer D’Souza (TIB)

References

  • David Schindler, Felix Bensmann, Stefan Dietze, Frank Krüger. “The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central”. PeerJ Computer Science. 2022. https://doi.org/10.7717/peerj-cs.835
  • David Schindler, Felix Bensmann, Stefan Dietze, Frank Krüger, “SoMeSci—A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles”, Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021). 2021. https://doi.org/10.1145/3459637.3482017

Find detailed information on the Workshop page: https://sdproc.org/2025/somd25.html

2024 edition

For the 2024 edition see: https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html

Previous ST SciVQA
Next ST SOTA?