Project:Analytics/PubPeer
API
https://dashboards.pubpeer.com/docs/api#/operations/partner
Relevant parameters:
page: start with1then iterate based on whether there are more resultsper_page: set at maximum value300sort:: concerns when the document was published; I only care about commentspublished_at
Resources
- Wikimedia Cloud Services
- Toolforge: project "pubpeer"
- Cloud VPS: project "wikicite", Trove DB instance
Process
- Initial seed:
- Build pageset
- Start from
2006-01-01...2025-12-31 - Iterate through as many pages as needed to get to the end
- Build internal database:
- id_pubpeer (key)
- id_doi (update on conflict)
- id_pubmed (update on conflict)
- id_arxiv (update on conflict)
- time_last_notified_wiki (
nullwhen created) - time_most_recent_comment (on conflict, update if submitted > stored)
- Subsequent builds:
- Get most recent
time_most_recent_commentfrom database - Start from
that date...present day - Iterate through as many result pages as needed (probably only one page)
- Submit into database, which should transparently handle conflicts
- Get most recent