Tips and tricks
We love to share our data and we want it to be used. Yes, really—and by as many people as possible. Here we share tips to help you get the most out of our REST API. Everyone benefits if you use our REST API responsibly and efficiently. Very occasionally we have had to block users who misuse our APIs, usually through carelessness rather than malice. If you follow the advice on this page you should have no problem.
Note that for conciseness, the examples on this page omit the API URL, which is https://api-crossref-org.turing.library.northwestern.edu/v1, so for an example like /members/120 the full query is https://api-crossref-org.turing.library.northwestern.edu/v1/members/120.
On this page
Where do I start?
If you are just getting started with our API, we recommend heading to the API Learning Hub. There you can get to grips with APIs, what kind of metadata you can retrieve, and how to formulate queries. For a description of the endpoints and request parameters, see our Swagger documentation, where you can try out some queries.
Get the right endpoint
Most of our queries go through the /works endpoint. If you’re looking for research outputs such as journal articles, that’s probably where you want to start.
Bear in mind that we also have other endpoints, including /prefixes, /journals, and /members. You can, for example, get all works in a journal by using its ISSN, e.g.: /journals/0003-3804/works. The members endpoint contains summary metadata about organisations that have deposited metadata, including which metadata fields they deposit and the fraction of records with certain properties e.g.: /members/120.
Use filter and query parameters
Our API contains a range of parameters that can be used to pull out records that match certain criteria. For example, various kinds of date filters: /works?filter=from-pub-date:2024-01-01,until-pub-date:2025-12-31 will get all records with a publication date in 2024.
There are filters to detect the presence of certain properties, e.g., /works?filter=has-references:1,filter=has-orcid:1 will return only works with both references and authors where at least one has an ORCID ID.
There are also filters for specific values, e.g., /works?filter=type:journal-article,funder:10.13039/100000040 retrieves journal articles funded by the organisation with the specified funder ID.
If you are looking to analyse a large number of metadata records, such as all records from a single journal or publisher, it is almost always more efficient to query using a filter than for each DOI individually.
Be selective with fields, if you need to be
If you are only interested in 2 or 3 fields of the output from the works endpoint, you can use select to retrieve only that metadata. For example, /works?rows=10&select=DOI returns the DOI field. Don’t do this if you’re looking for more than 3 or 4 fields: the longer your list of fields gets, the more it slows down the query. Instead, retrieve the whole record and discard the information you don’t need. This can make your API calls much more efficient. For example:
Retrieve 10 random DOIs in the Crossref corpus with only the DOI and record title: https://api-crossref-org.turing.library.northwestern.edu/works?rows=10&select=DOI,title
Retrieve all works for ISSN 1527-2095 with only these elements returned in the results: DOI, title, and page number (only the first 20 results are returned): https://api-crossref-org.turing.library.northwestern.edu/journals/1527-2095/works?select=DOI,title,page
You can make more efficient use of the API and get results more quickly by thinking in advance about how many results you need. The default number of rows returned is 20 and you can increase it up to 1000. If you have a query where you only need to know the total number of results, you can use rows=0. For requests with a query parameter, 2–5 rows might be enough (see more on that below), whereas to look at a few examples of records with a certain property, maybe 10 records is enough.
Each endpoint has a limit on the number of items returned in a single request. Paginating through multiple pages of results is possible through the cursor parameter.
To retrieve multiple pages, add cursor=* to your first request (and a value for rows that is greater than 0). The response will include a next-cursor value. Use this in your next request to obtain the following page of results. Stop sending requests when the number of items in the response is less than the number of rows requested.
Cursors expire after 5 minutes if not used. Note that our REST API returns a cursor even on the last page. To stop your script at the correct point, check the number of results returned—you have reached the last page when it is less than the requested rows.
Example: https://api-crossref-org.turing.library.northwestern.edu/funders/501100009187/works?cursor=*&rows=600&select=DOI,title,container-title,is-referenced-by-count (Note: in this example, at the time of this writing, there are ~1025 results for this query, so you are requesting 600 rows; the first page of results will include 600 results, while the second will include ~425, thus you know that you have retrieved all results once you see the second page, since 425 < 600).
Large numbers of queries and very large results sets
If you are planning to get hundreds of thousands or even millions of records from our API, read this section before you get started!
First, determine whether you really need to use the REST API. We have an annual public data file that contains all of our data. If you are a Metadata Plus subscriber, you have access to a monthly snapshot. Using these and setting up a local database means that you can run more custom queries than with the API and get results more quickly.
Second, cache your results. Try to avoid making the same requests repeatedly. Our metadata does change over time, but the majority of records change infrequently, if at all.
Third, if you do need to make a query with a very large result set, we recommend splitting it into a series of smaller queries. You can use cursors to page through results, but if you’re running to thousands of pages, the chance of a cursor failing and expiring at some point becomes much higher. For example, if you break down a request into days or weeks and one of them fails, it will be much easier to go back and pick up the missing data. Also pay attention to the http status code and back off if you start seeing 4XX statuses.
How to keep your local data synced with the REST API
You might have a request that you want to make repeatedly and keep local results cached. You may even want to have a complete copy of the Crossref database and keep it up-to-date (in this case consider whether Metadata Plus might be a good option for you, since the monthly snapshots provided as part of the service would best enable this. Here are a few suggestions and tips for how to do that.
Choose your date filter. There are three types of date filters that can help you pick up new and updated items.
The created date will return any new records (from-created-date, until-created-date).
Using updated date will give you both new records and those with any changes deposited by the member (from-update-date, until-update-date).
The indexed date retrieves all records updated or created by the stearding member, as well as those that were modified by Crossref or other third parties, for example to update the Cited-by count or add a relationship deposited by a different member (from-index-date, until-index-date).
As you can see, the third of these options gives you the most results, but it can return a very large number of records. Which option you choose will depend on how you want to use the metadata. For this use case, we don’t recommend using published date, as this can change over time and might be different from when the record was created, meaning that you are likely to miss results.
Here are some other considerations.
Choose your frequency. How often do you want to get new records? All of these filters offer the option to retrieve updates once per second, but you might decide that once an hour, once a day, or once a month is ok. Note that the timestamps are inclusive, so to get everything created between 12:00 and 13:00 on 1 January 2025, you can use: filter=from-created-date:2025-02-02T12,until-created-date:2025-01-12. Using 13 instead of 12 in the until-created-date filter will get you two hours of data, not one.
Use cursors. If your time range is reasonably large or you aren’t using other additional filters, it is likely that you will reach more than the page limit of 1000 items. Use cursors in your request to make sure you get all of the results (see above).
Cache. To make sure you don’t keep retrieving the same unchanged records, make sure you save the responses locally. If you are looking for updates or newly indexed works, you will need to replace the old records in your cache with the newest version.
Code Libraries
There are a number of libraries that have been written for the Crossref REST API. These are neither maintained nor endorsed by Crossref (except where noted). Available libraries include:
- crossrefapi (Go)
- pitaya (Julia)
- crossref-commons (Python, developed by Crossref)
- habanero (Python)
- crossrefapi (Python)
- rcrossref (R)
- serrano (Ruby)
- crossref-rs (rust)
- Crossref API Typescript client (Typescript)