Protecting health and disease data from a purge - US edition
In the US, the new administration appear to be preparing for a new regime by cutting off access - at least temporarily - to Centres for Disease Control datasets.
The CDC host and maintain major datasets, at the federal level, of US data on the progress and history of diseases. But if you try data.cdc.gov right now, you will either get nothing, or this landing page
[Update: 17:58 1/2/2025 - this landing page has been removed; the datasets highlighted below remain offline]
Some links to some databases still work. So - as at the time of writing - do their recommendations on vaccination schedules. Others are not available. In general, the pattern seems to be any data that makes reference to gender or to HIV is more likely to be unavailable. The following appear to be completely down.
CDC Atlas of disease
CDC Youth Risk Behavior Surveillance System
TargetHIV the The CDC/HRSA programme for caring for people with HIV
(Usual links above, so do check how they’re doing. Not a single one of these works for me as of the afternoon of 1 February 2025.)
This is, of course insane, though - it must be said - entirely in line with what the American people voted for last November. Moves like this were promised in the Project 2025 manifesto.
And the CDC is of course not alone in this: pages relevant to PEPFAR (the President’s fund for the prevention of AIDS), and government databases on international tracking and prediction of famine, have also been removed. But then many other datasets are still there (e.g., I can’t see any obvious missing CDC COVID datasets.)
So, the promise on the landing page may be honoured, and it’s all just temporary while they comply with a couple of Presidential orders1 … but since both the orders themselves, and the CDC’s response of pulling everything is mad … and on the assumption that this madness may continue, many people have been trying to save historical records of these major CDC databases.
Archiving historical information itself is of course no substitute for having regularly updated and cleaned live data for any new development (bird flu is the most obvious one to be concerned about), but having some archive offline is better than the data being lost altogether - either by mistake, or deliberately.
So here are a few resources that people may find helpful:
All US government pages as they were at the end of the Biden administration (N.B., pages only, not databases) https://eotarchive.org/
Many of the big tables and CSVs from the CDC datasets. Unclear to me how much is included, but a great deal of valuable tables are clearly there. https://archive.org/details/20250128-cdc-datasets
A CDC FTP site, which - again at the time of writing - seems to be still up and to have a fair amount of raw data still accessible.
https://ftp.cdc.gov/
The CDC’s geospatial Social Vulnerability Index layers have been extracted and are now on the Esri Living Atlas: livingatlas.arcgis.com
A progress report on a comprehensive torrent file being put together by the reddit community: r/Datahoarder. https://www.reddit.com/r/DataHoarder/comments/1ibnjbb/comment/m9l4ajh/
A tracker for journalists or researchers who have downloaded further copies of CDC data - only use if you are comfortable being identified - it may help to know who has what. https://docs.google.com/forms/d/e/1FAIpQLScdi6gcBKKqoK3iV4uHsGFnaSlgT2vAfHaCr_6SdNZJkYWRhQ/viewform
There are no doubt more, quieter approaches to keeping this data out from political filters and possible deletion, and I will not talk about those who would rather keep things quiet, but please let me know if there are other public efforts that should be linked here.
Because so far, the American public’s response to all this - and wider moves - looks to me to be total apathy. And I would rather like to be proved wrong about this.
Laying aside the obvious politicisation of scientific information that this represents.


Update: 17:51 GMT 1/02/25. https://data.cdc.gov/ is now back up, with no trace of the landing "info" page that was there before. The datasets highlighted in the piece are still inaccessible.
For example, the newly functional homepage has a link straight to the CDC Atlas. Try clicking on it.
There are 15 pages of links to all the pages stored at archive.org available at https://acasignups.net/25/02/03/links-archived-versions-every-cdcgov-page-available-pre-purge-part-1-15