OA-signalling icon indicating copy to clipboard operation
OA-signalling copied to clipboard

Enable programmatic full-text import from PMC into Wikisource

Open Daniel-Mietchen opened this issue 11 years ago • 13 comments

So that presence of article in Wikisource could be signaled in citation on Wikipedia

Daniel-Mietchen avatar Feb 18 '14 20:02 Daniel-Mietchen

See also https://github.com/konrad/JATS-to-Mediawiki/issues/3 .

Daniel-Mietchen avatar Feb 18 '14 22:02 Daniel-Mietchen

Daniel, do you have anybody working on this? I could try to spend some time on it, if not.

Klortho avatar Feb 20 '14 03:02 Klortho

The plan is that @wrought will start working on this, but I guess there will be occasions where you could be of help - we'll get back to you then. Thanks.

Daniel-Mietchen avatar Feb 20 '14 08:02 Daniel-Mietchen

I plan to start working on it too, but I am curious @Klortho do you have ideas or an outline about the best way to proceed? As in, what strategy to use, both technical and non-technical?

notconfusing avatar Feb 24 '14 18:02 notconfusing

Hi, Max, Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web service to discover new and changed articles in the OA subset.

Klortho avatar Feb 27 '14 06:02 Klortho

We have such a driver script at https://github.com/erlehmann/open-access-media-importer/blob/master/oa-pmc-ids. However, the import to Wikisource is to be triggered by citations on Wikipedia, so the focus will be less on discovering new articles.

http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-daniel/ https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org

On Thu, Feb 27, 2014 at 7:09 AM, Chris Maloney [email protected]:

Hi, Max, Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web servicehttps://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/to discover new and changed articles in the OA subset.

— Reply to this email directly or view it on GitHubhttps://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36213408 .

Daniel-Mietchen avatar Feb 27 '14 09:02 Daniel-Mietchen

A simple way to watch for new citations to trigger the driver script is to watch (i.e. poll at an interval) the "what links here" transclusions of the citation template. Is there a better way than constantly polling for transclusions at an interval?

On Thu, Feb 27, 2014 at 1:02 AM, Daniel Mietchen [email protected]:

We have such a driver script at

https://github.com/erlehmann/open-access-media-importer/blob/master/oa-pmc-ids.

However, the import to Wikisource is to be triggered by citations on Wikipedia, so the focus will be less on discovering new articles.

http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-daniel/ https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org

On Thu, Feb 27, 2014 at 7:09 AM, Chris Maloney [email protected]:

Hi, Max, Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web service< https://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/>to discover new and changed articles in the OA subset.

Reply to this email directly or view it on GitHub< https://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36213408>

.

Reply to this email directly or view it on GitHubhttps://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36222436 .

notconfusing avatar Mar 03 '14 20:03 notconfusing

Alternative options include dumps, or Recent changes feeds - both would seem to me better than constant polling.

Plus, we probably want to wait a week or so for a citation to consolidate, so as not to become a toy for spammers.

Daniel-Mietchen avatar Mar 03 '14 21:03 Daniel-Mietchen

Plus, we probably want to wait a week or so for a citation to consolidate, so as not to become a toy for spammers.

I don't see the problem here, since we're only talking about selecting which PMC articles to import to WikiSource, right? Wouldn't it be reasonable to assume, that once an article is in PMC, that it's at least eligible to be imported into WikiSource? How bad a problem would "false positives" be?

Klortho avatar Mar 10 '14 02:03 Klortho

The thing is that there is no clear policy around that, so any automated tool would have to err on the side of caution. In any case, I think we shall start with the most cited articles - an article-level version of https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Popular1 .

Daniel-Mietchen avatar Mar 13 '14 00:03 Daniel-Mietchen

Related issues: https://github.com/Daniel-Mietchen/OA-signalling/issues/9 and https://github.com/Daniel-Mietchen/OA-signalling/issues/37 .

Daniel-Mietchen avatar Apr 11 '14 00:04 Daniel-Mietchen

Updated title to reflect that the goal is to do this programmatically for quality, with community control at heart, not just automatically. ;)

wrought avatar May 08 '14 20:05 wrought

All the technology at the moment is fully in place to do this, only we are waiting on the Wikisource community.

notconfusing avatar Jul 03 '14 18:07 notconfusing