Can apply to: Research datasets and data publications
Metric definition: The number of times that a dataset or data publication have been included in the reference lists of published articles, books, conference proceedings, or other documents.
Metric calculation: Citations are typically counted if found in reference lists of scholarly works that conform to particular formatting standards (which vary by data provider; see Transparency section below).
Data sources: Dataset citations from Datacite, Google Scholar, and Web of Science; data publication citations from Crossref, Dimensions, Scopus, Google Scholar, and Web of Science
Appropriate use cases: Demonstrating the scholarly influence of data. Specifically, data citations should be used to understand how often research data has been reused in others’ studies, thereby indicating advancement of the field. Some fields (e.g.,crystallography and genomics) practice data citation at higher rates than others, and therefore evaluation of research from those fields may be more suitable scenarios for using data citations.
- Data citation is still relatively rarely practiced, with only half of journals providing instruction for how to cite data and more than 88% of all Data Citation Index records going uncited. 61% of articles in the social sciences fail to provide any type of citation to the dataset.
- Citations standards vary from one field to another, making it difficult to compare citation rates across disciplines. Disciplinary coverage in the Data Citation Index (as of 2017) is skewed, favoring the life sciences (48% of records) over the social sciences (20%), physical sciences (23%), arts & humanities (7%), and multidisciplinary research (2%).
- The availability of data should be taken into account when attempting to make comparisons for data citation rates against other data sets, as in some disciplines, open access data is cited at higher rates (up to 69% higher for cancer research).
Inappropriate use cases: Citation counts are a measure of impact and visibility but should not be interpreted as a direct measure of quality. Citation counts from datasets in different scientific disciplines should not be compared without normalization to account for disciplinary variance.
Transparency: Datacite offers a clear methodology for how the service tracks data citations, as well as an open Event Data API for finding data citations. The Data Citation Index is fully transparent regarding the data repositories it indexes, and has released guidelines explaining how Web of Science uses properly formed citations to datasets in order to calculate citations for the DCI. Google Scholar can index any content that conforms to their formatting guidelines, but is designed to primarily index journal articles, monographs, and other “print” outputs.
Timeframe: Datasets from any year can be referenced in scholarly literature. Data Citation Index includes citations to data from 1800 onwards, with 90% of data indexed since 2007 and 99% of data indexed since 1995.