Citations, Data

Can apply to: Research datasets and data publications

Metric definition: The number of times that a dataset or data publication have been included in the reference lists of published articles, books, conference proceedings, or other documents.

Metric calculation: Citations are typically counted if found in reference lists of scholarly works that conform to particular formatting standards (which vary by data provider; see Transparency section below).

Data sources: Dataset citations from Datacite, Google Scholar, and Web of Science; data publication citations from Crossref, Dimensions, Scopus, Google Scholar, and Web of Science

Appropriate use cases: Demonstrating the scholarly influence of data. Specifically, data citations should be used to understand how often research data has been reused in others’ studies, thereby indicating advancement of the field. Some fields (e.g.,crystallography and genomics) practice data citation at higher rates than others, and therefore evaluation of research from those fields may be more suitable scenarios for using data citations.


Inappropriate use cases: Citation counts are a measure of impact and visibility but should not be interpreted as a direct measure of quality. Citation counts from datasets in different scientific disciplines should not be compared without normalization to account for disciplinary variance.

Available metric sources:  Datasets: Datacite, Data Citation Index in Web of Science, Google Scholar; Data publications: Dimensions, Google Scholar, Scopus, and Web of Science

Transparency: Datacite offers a clear methodology for how the service tracks data citations, as well as an open Event Data API for finding data citations. The Data Citation Index is fully transparent regarding the data repositories it indexes, and has released guidelines explaining how Web of Science uses properly formed citations to datasets in order to calculate citations for the DCI. Google Scholar can index any content that conforms to their formatting guidelines, but is designed to primarily index journal articles, monographs, and other “print” outputs.

Website: n/a

Timeframe: Datasets from any year can be referenced in scholarly literature. Data Citation Index includes citations to data from 1800 onwards, with 90% of data indexed since 2007 and 99% of data indexed since 1995.