Blog

 4 minute read.

Two billion citation links in Crossref help research travel further

We’ve recently reached an important milestone for the research nexus: the works in our metadata corpus are now connected with over 2 billion citation links! This is a great opportunity to share a dedicated dataset and discuss why these are important for science.

The reference metadata is a lifeline of discoverability. Scholars use citations to critique and build on existing research. They acknowledge the contributions of others through references. Our members can then deposit those references as part of metadata with Crossref, and we use those to link the cited and citing objects. This results in complex thematic networks that can be explored by interested researchers. Many tools for research discovery use the linked reference metadata in Crossref to support searches of related content.

The citation links are derived from bibliographic references in the metadata of one work that include DOIs of materials it cites (scholarly works, data, code, etc.). It’s always best if the members can deposit these relationships in full. In a recent post, we shared that nearly half of these links are asserted by our members through metadata deposits, and the other half are created thanks to our automated matching. This form of metadata enrichment happens when members include some information about the references but without the DOI of the cited work, and it’s enough to automatically find and add that DOI. The enrichment supports making data more useful for the community.

The most important impact of citation links is the increased discoverability of connected works. Reference metadata is an important tool for improving visibility and readership of our members’ content. These links are also the foundation of our Cited-by service, which enables implementing members to display citation counts of the work they published on their landing pages.

The chart below shows the cumulative count of citations over time, by the created date of the citing DOI’s record. These include records linked by DOI either through member-submitted metadata or matched by Crossref, as well as records that are unmatched. Unmatched records can include records that we were unable to match with the information we have, but also records that truly have no DOI to link to. You can explore the full citation dataset of all 2 billion citation links between Crossref DOIs available now as a (somewhat hefty) download.

cumulative count of references by created date of citing DOI, split by three categories: references with DOIs submitted by members; references with DOIs matched by Crossref; and references with no matched DOIs

Cumulative count of references deposited to Crossref by created date of citing DOI

The push for open citation data is something that has unfolded over the last few decades, making more and more of these relationships public. Notably, the growth in citation links reflects not just the output of new scholarship, but also a sustained effort to extend coverage of the historical scholarly record. We can see evidence of this playing out over time by looking at our historical data—periodic snapshots of Crossref’s metadata going back to 2019. When comparing successive snapshots and examining the publication dates of citing and cited works, we can classify each newly appearing citation as either a new paper citation, or a retrospective one. A new citation is where the citing work was published since the previous snapshot, representing real growth in the scholarly record. A retrospective citation is where both papers already existed but the link between them had not yet been captured by Crossref, and these represent indexing catchup rather than new publishing activity.

The chart below shows the cumulative count of citations added in each category since 2019. In the early years of our data, retrospective backfill was the dominant source: the blue line climbs steeply from 2019 to 2021 as a large volume of previously uncaptured historical citation relationships entered the corpus. Over time, however, that rate of backfilling has levelled off. New paper citations, meanwhile, have grown steadily throughout the period, and by 2025 they surpassed the cumulative retrospective total. The open citation ecosystem continues recovering historical links, but the citation network’s growth is now increasingly driven by the natural momentum of scholarly publishing itself.

retrospective cumulative by year added by crossref

Cumulative citations added to Crossref by type, 2019–2026. Retrospective citations (blue) represent links to and from works that existed before the previous snapshot; new paper citations (green) come from works published since the last snapshot.

Combined with other metadata for more context, reference metadata supports bibliographic and meta-research on different aspects of the scholarly process, and can support judgements about research integrity and conflicts of interest.

Stereotypically, when talking about references, we consider links to published works (whether preprints, journal articles, or books). However all types of records in Crossref can be cited. Thanks to the changes in our latest schema, members can now signal the types of content that is being referenced. And with our new Data citations endpoint, the community can explore specifically links from Crossref-registered records to research data, including citation links to works within Crossref, as well as DataCite’s corpus.

Close to half of all records registered with Crossref still have none or not enough reference information to make such connections. We invite members to regular Metadata health-check webinars to support them in improving completeness of their records for increased transparency and visibility.

Further reading

Page maintainer: Kornelia Korzec
Last updated: 2026-May-26