Project:Analytics/PubPeer: Difference between revisions
(Start) |
(I think that's the workflow) |
||
| Line 16: | Line 16: | ||
* Initial seed: | * Initial seed: | ||
** Build pageset | ** Build pageset | ||
** | *** Pull usages of identifiers from Wikimedia Cloud DB Replicas | ||
*** Create database table (wikipedia): | |||
**** id (incrementing key) | |||
**** language_code | |||
**** mw_page_id | |||
**** mw_page_title (probably should have a process to refresh this before the full process runs) | |||
** API query: <code>2006-01-01...2025-12-31</code> | |||
** Iterate through as many pages as needed to get to the end | ** Iterate through as many pages as needed to get to the end | ||
** Build internal database: | ** Build internal database (pubpeer_articles table): | ||
*** id_pubpeer (key) | *** id_pubpeer (key) | ||
*** id_doi (update on conflict) | *** id_doi (update on conflict) | ||
*** id_pubmed (update on conflict) | *** id_pubmed (update on conflict) | ||
*** id_arxiv (update on conflict) | *** id_arxiv (update on conflict) | ||
*** | **Build minimal citations database (citations table): | ||
***id_pubpeer (key to pubpeer_articles table) | |||
***id_wiki_page (key to wikipedia table) | |||
***time_last_updated_table (<code>null</code> when created) | |||
***time_last_talk_page_post (<code>null</code> when created) | |||
*** time_most_recent_comment (on conflict, update if submitted > stored) | *** time_most_recent_comment (on conflict, update if submitted > stored) | ||
**Post initial report to wiki as a table | |||
**Post initial notification that the report is posted | |||
* Subsequent builds: | * Subsequent builds: | ||
** Get most recent <code>time_most_recent_comment</code> from database | ** Get most recent <code>time_most_recent_comment</code> from database | ||
** | ** API query: <code>that date...present day</code> | ||
** Iterate through as many result pages as needed (probably only one page) | ** Iterate through as many result pages as needed (probably only one page) | ||
** Submit into database, which should transparently handle conflicts | ** Submit into database, which should transparently handle conflicts | ||
** Build new wiki table based on citations database table | |||
*** Check database for | |||
****null time_last_updated_table | |||
****time_most_recent_comment > time_last_updated_table | |||
** Come up with alerts describing changes to table. | |||
*** Retire old notifications to a subpage | |||
*If/when talk page notifications are approved: | |||
**Check database for | |||
***null time_last_talk_page_post | |||
***time_most_recent_comment > time_last_talk_page_post | |||
**Queue up talk pages to notify | |||
**Check presence for message already on talk page. | |||
***No message comment: add post to talk page | |||
***Presence of message comment: skip over talk page | |||
Revision as of 00:52, 8 January 2026
API
https://dashboards.pubpeer.com/docs/api#/operations/partner
Relevant parameters:
page: start with1then iterate based on whether there are more resultsper_page: set at maximum value300sort:: concerns when the document was published; I only care about commentspublished_at
Resources
- Wikimedia Cloud Services
- Toolforge: project "pubpeer"
- Cloud VPS: project "wikicite", Trove DB instance
Process
- Initial seed:
- Build pageset
- Pull usages of identifiers from Wikimedia Cloud DB Replicas
- Create database table (wikipedia):
- id (incrementing key)
- language_code
- mw_page_id
- mw_page_title (probably should have a process to refresh this before the full process runs)
- API query:
2006-01-01...2025-12-31 - Iterate through as many pages as needed to get to the end
- Build internal database (pubpeer_articles table):
- id_pubpeer (key)
- id_doi (update on conflict)
- id_pubmed (update on conflict)
- id_arxiv (update on conflict)
- Build minimal citations database (citations table):
- id_pubpeer (key to pubpeer_articles table)
- id_wiki_page (key to wikipedia table)
- time_last_updated_table (
nullwhen created) - time_last_talk_page_post (
nullwhen created) - time_most_recent_comment (on conflict, update if submitted > stored)
- Post initial report to wiki as a table
- Post initial notification that the report is posted
- Build pageset
- Subsequent builds:
- Get most recent
time_most_recent_commentfrom database - API query:
that date...present day - Iterate through as many result pages as needed (probably only one page)
- Submit into database, which should transparently handle conflicts
- Build new wiki table based on citations database table
- Check database for
- null time_last_updated_table
- time_most_recent_comment > time_last_updated_table
- Check database for
- Come up with alerts describing changes to table.
- Retire old notifications to a subpage
- Get most recent
- If/when talk page notifications are approved:
- Check database for
- null time_last_talk_page_post
- time_most_recent_comment > time_last_talk_page_post
- Queue up talk pages to notify
- Check presence for message already on talk page.
- No message comment: add post to talk page
- Presence of message comment: skip over talk page
- Check database for