Enable programmatic full-text import from PMC into Wikisource
So that presence of article in Wikisource could be signaled in citation on Wikipedia
See also https://github.com/konrad/JATS-to-Mediawiki/issues/3 .
Daniel, do you have anybody working on this? I could try to spend some time on it, if not.
The plan is that @wrought will start working on this, but I guess there will be occasions where you could be of help - we'll get back to you then. Thanks.
I plan to start working on it too, but I am curious @Klortho do you have ideas or an outline about the best way to proceed? As in, what strategy to use, both technical and non-technical?
Hi, Max, Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web service to discover new and changed articles in the OA subset.
We have such a driver script at https://github.com/erlehmann/open-access-media-importer/blob/master/oa-pmc-ids. However, the import to Wikisource is to be triggered by citations on Wikipedia, so the focus will be less on discovering new articles.
http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-daniel/ https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org
On Thu, Feb 27, 2014 at 7:09 AM, Chris Maloney [email protected]:
Hi, Max, Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web servicehttps://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/to discover new and changed articles in the OA subset.
— Reply to this email directly or view it on GitHubhttps://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36213408 .
A simple way to watch for new citations to trigger the driver script is to watch (i.e. poll at an interval) the "what links here" transclusions of the citation template. Is there a better way than constantly polling for transclusions at an interval?
On Thu, Feb 27, 2014 at 1:02 AM, Daniel Mietchen [email protected]:
We have such a driver script at
https://github.com/erlehmann/open-access-media-importer/blob/master/oa-pmc-ids.
However, the import to Wikisource is to be triggered by citations on Wikipedia, so the focus will be less on discovering new articles.
http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-daniel/ https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org
On Thu, Feb 27, 2014 at 7:09 AM, Chris Maloney [email protected]:
Hi, Max, Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web service< https://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/>to discover new and changed articles in the OA subset.
Reply to this email directly or view it on GitHub< https://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36213408>
.
Reply to this email directly or view it on GitHubhttps://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36222436 .
Alternative options include dumps, or Recent changes feeds - both would seem to me better than constant polling.
Plus, we probably want to wait a week or so for a citation to consolidate, so as not to become a toy for spammers.
Plus, we probably want to wait a week or so for a citation to consolidate, so as not to become a toy for spammers.
I don't see the problem here, since we're only talking about selecting which PMC articles to import to WikiSource, right? Wouldn't it be reasonable to assume, that once an article is in PMC, that it's at least eligible to be imported into WikiSource? How bad a problem would "false positives" be?
The thing is that there is no clear policy around that, so any automated tool would have to err on the side of caution. In any case, I think we shall start with the most cited articles - an article-level version of https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Popular1 .
Related issues: https://github.com/Daniel-Mietchen/OA-signalling/issues/9 and https://github.com/Daniel-Mietchen/OA-signalling/issues/37 .
Updated title to reflect that the goal is to do this programmatically for quality, with community control at heart, not just automatically. ;)
All the technology at the moment is fully in place to do this, only we are waiting on the Wikisource community.