James Hare
James Hare
I've noticed problems with strip_code() and filtering HTML tags as well. Based on : ``` {{Benutzer:ZOiDberg/Vorlage:user py}}{{Benutzer:Sitic/Babel/tools}}{{Benutzer:FNDE/Vorlage/Bot}} {{Bot|FNDE |Kontakt = Benutzer_Diskussion:FNDE |Modus = automatisch }} FNBot kann im Notfall sofort...
Thank you for taking the initiative in doing this! I'm glad someone out there understands how pywikibot-for-Wikidata works. My original idea was much more ambitious, adding every data field present...
It's kind of a gray area if you're running a bot under your own account on Wikidata. As long as you are not making edits _too_ quickly you should be...
So the idea is to have Wikidata and Wikipedia fields living side by side, with the Wikipedia field occasionally checked for discrepancies with Wikidata and updated accordingly? That sounds good...
If only Congress worked like this! I submitted a pull request. There were 65 members of Congress missing links to Wikipedia entries, but as a trial I also included Wikidata...
I have submitted a pull request including Wikidata IDs for all current members of Congress. I did this by starting with the Wikipedia title, making sure it referred correctly to...
Related issue: there are a lot of DOIs ending with periods, even though they're not supposed to. For example: https://w3id.org/oc/corpus/br/10172.html
http://opencitations.net/corpus/br/10172.html and http://opencitations.net/corpus/br/58998.html have the same title, and are probably the same document, but published in different venues. But the identifiers are mixed up between the two. (And based on...
* http://opencitations.net/corpus/br/510599.html * http://opencitations.net/corpus/br/996542.html Similar but not the same: * http://opencitations.net/corpus/br/38699.html and http://opencitations.net/corpus/br/69774.html * http://opencitations.net/corpus/br/15794.html and http://opencitations.net/corpus/br/1273871.html Contains irrelevant identifiers: * https://w3id.org/oc/corpus/br/1206326.html * https://w3id.org/oc/corpus/br/2844013.html * https://w3id.org/oc/corpus/br/40019.html * https://w3id.org/oc/corpus/br/3950240.html * https://w3id.org/oc/corpus/br/178492.html...
I do not have the time to go through and list all the errors. But hopefully I have given you enough data to show that there are significant quality control...