cd
NFDI4DS

ST SOTA?

ST SOTA?

2024-11-01
2 min read

SOTA - Tracking the State-of-the-Art in AI Scholarly Publications

NFDI4DS partners organize a Shared Task at SimpleText2024 co-located with CLEF2024.

The central activity around empirical AI research includes automated tasks defined via a task dataset for which machine learning models are developed whose performance can be evaluated by a standard set of evaluation metrics. Pushing the state-of-the-art boundaries in empirical AI research means optimizing the models developed for the tasks in terms of speed, accuracy, or storage. As such, researchers in this domain often seem to ask the central question “What’s the state-of-the-art result for task XYZ right now?”

Instead of seeking out the answer buried in the ranked list of documents via a search query made on traditional search engines, researchers look for the answer on community-curated leaderboards such as https://paperswithcode.com/ or https://orkg.org/benchmarks. These leaderboards are websites specifically designed to showcase the performance of all introduced machine learning models on a machine learning task dataset. As such, researchers seeking to find out the best model performance on a task dataset can easily obtain this information on these websites via their performance trendline overviews showcasing various model performances over a task dataset over time.

In this Shared Task, we go beyond the community curation of leaderboards and hope to realize the vision of obtaining the most efficient machine learning model capable of automatically detecting leaderboards. The efficiency of the submitted machine learning models as a solution to the shared task will be tested based on speed, model parameters, and leaderboard detection F1 measure.

The SOTA? shared task is defined on a dataset of Artificial Intelligence scholarly articles. There are two kinds of articles: one reporting (Task, Dataset, Metric, Score) tuples and another kind that do not report the TDMS tuples. For the articles reporting TDMS tuples, all the reported TDMS annotations are provided in a separate file accompanying the scraped full-text of the articles. The extraction task is defined as follows.

The task is to develop a machine learning model that can distinguish whether a scholarly article provided as input to the model reports a TDMS or not. And for articles reporting TDMSs, extract all the relevant ones.

Given the recent upsurge in the developments in generative AI in the form of Large Language Models (LLMs), creative LLM-based solutions to the task are particularly invited. The task does not place any restrictions on the application of open-sourced versus closed-sourced LLMs. Nonetheless, development of open-sourced solutions are encouraged.

Find detailed information on the Shared Task page: https://sites.google.com/view/simpletext-sota

Previous ST KGQA
Next ST FoRC