WikibaseIntegrator
WikibaseIntegrator copied to clipboard
BUG: Fix behaviour of MERGE_REFS_OR_APPEND when datavalue is a blank node
As a bot developer for Structured Data on Commons, I want to be able to use ActionIfExists.MERGE_REFS_OR_APPEND for cases where the value of the property is a blank node.
Some files in Wikimedia Commons use a modelling that includes some value as a value, e.g.
From https://commons.wikimedia.org/wiki/File:Beitrag_zur_Flora_Brasiliens_(Pl.12)(8227161802).jpg
In this case, the claim exists, but has no "datavalue", which leads to an error on this line: https://github.com/LeMyst/WikibaseIntegrator/blob/c69b84a9623430431040f612b51c17795b16f137/wikibaseintegrator/models/claims.py#L108
Here is how the JSON for the datavalue-less claim looks like:
{'mainsnak':
{'snaktype': 'somevalue', 'property': 'P170'},
'type': 'statement',
'id': 'M42778810$04DF04D0-6234-4915-8A96-EE352E6EF350',
'rank': 'normal',
'qualifiers': {'P3267': [{'snaktype': 'value', 'property': 'P3267', 'datavalue': {'value': '61021753@N02', 'type': 'string'}}],
'P2093': [{'snaktype': 'value', 'property': 'P2093', 'datavalue': {'value': 'Biodiversity Heritage Library', 'type': 'string'}}],
'P2699': [{'snaktype': 'value', 'property': 'P2699', 'datavalue': {'value': 'https://www.flickr.com/people/biodivlibrary/', 'type': 'string'}}]},
'qualifiers-order': ['P3267', 'P2093', 'P2699']}
Which is compared in this case to:
{'mainsnak':
{'snaktype': 'value', 'property': 'P170',
'datatype': 'wikibase-item',
'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 131760409, 'id': 'Q131760409'}, 'type': 'wikibase-entityid'}},
'type': 'statement',
'rank': 'normal',
'qualifiers': {'P518': [{'snaktype': 'value', 'property': 'P518', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 112134971, 'id': 'Q112134971'}, 'type': 'wikibase-entityid'}}],
'P3831': [{'snaktype': 'value', 'property': 'P3831', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 644687, 'id': 'Q644687'}, 'type': 'wikibase-entityid'}}]},
'qualifiers-order': []}
I will try and prototype a solution here. Cheers!
This is the current code, by the way, in development: https://github.com/lubianat/bhl_sdc_exploration/tree/main/reconciliation_bot
Inded some issue with comparison of qualifiers. See:
<Snak @fdbd40 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P518' _Snak__hash=None _Snak__datavalue={'value': {'entity-type': 'item', 'numeric-id': 112134971, 'id': 'Q112134971'}, 'type': 'wikibase-entityid'} _Snak__datatype='wikibase-item'>
is different from
<Snak @515c70 _Snak__snaktype=<WikibaseSnakType.KNOWN_VALUE: 'value'> _Snak__property_number='P518' _Snak__hash='288716b1efb9e21850a034325ebeeb0089b4e2c2' _Snak__datavalue={'value': {'entity-type': 'item', 'numeric-id': 112134971, 'id': 'Q112134971'}, 'type': 'wikibase-entityid'} _Snak__datatype=None>
Though I am not exactly certain why, because the original statement in this case was written to Commons using the same code .
My current hypothesis is that the * _Snak__datatype* is being retrieved from the Wikibase as "None".
So, I was able to circumvent the head issue, but I am running into a few other bugs — I am sorry, I am not very knowledgeable in the inner Wikibase structures.
In some part of the code, I test:
claim.quals_equal(claim, existing_claim):
This is yielding false for the following pair:
{'mainsnak':
{'snaktype': 'value',
'property': 'P1433',
'datavalue':
{'value': {'entity-type': 'item', 'numeric-id': 51446243, 'id': 'Q51446243'}, 'type': 'wikibase-entityid'}},
'type': 'statement',
'id': 'M42778917$DDA05289-CA80-471C-97FB-BC2E073F2B28',
'rank': 'normal',
'qualifiers': {'P518': [{'snaktype': 'value', 'property': 'P518',
'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 112134971, 'id': 'Q112134971'}, 'type': 'wikibase-entityid'}}]},
**'qualifiers-order': ['P518']**}
{'mainsnak':
{'snaktype': 'value',
'property': 'P1433',
'datatype': 'wikibase-item',
'datavalue':
{'value': {'entity-type': 'item', 'numeric-id': 51446243, 'id': 'Q51446243'}, 'type': 'wikibase-entityid'}},
'type': 'statement',
'rank': 'normal',
'qualifiers': {'P518': [{'snaktype': 'value', 'property': 'P518',
**'datatype': 'wikibase-item',**
'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 112134971, 'id': 'Q112134971'}, 'type': 'wikibase-entityid'}}]},
**'qualifiers-order': []**}
I could spot these 2 differences, t seems like the lack of the correct datatype was breaking things
I am running some workarounds just to test if my mind is in the right direction. It seems like so: I modified the qual check to ignore the datatype and compare only values (not great, but seemingly needed)
The same issue is happening with the references: the ones retrieved from Commons come without datatype.
Old item references:
{'snaks': {'P854': [{'snaktype': 'value', 'property': 'P854', 'datavalue': {'value': 'https://www.biodiversitylibrary.org/bibliography/909', 'type': 'string'}}]}, 'snaks-order': ['P854']}
New item references:
{'snaks': {'P854': [{'snaktype': 'value', 'property': 'P854', 'datatype': 'url', 'datavalue': {'value': 'https://www.biodiversitylibrary.org/bibliography/909', 'type': 'string'}}]}, 'snaks-order': []}
It might be something in the way MediaInfo is representing snaks, not sure.
Hello @lubianat , Thank you for your issue, the analyze and the merge request. I will need some time to review this and merge everything in the main branch.
@LeMyst Thank you! Do take your time — I am still working on figuring out the details here. Please do not merge any of my code, it is mostly garbage at this point, I needed to quickly fix a bug. I will try and clean up the contributions.
I tried to reproduce my own errors and fixes after a few hours and could not.
For some reason, I am unable to get the claims from the API, even for the test you shared:
media.claims are empty.
I am going to take a break and retry tomorrow
I am investigating more. I am not sure why it was getting the claims before but not now (probably because I changed the version of WikibaseIntegrator without properly keeping track).
The bug now does have some relation to this: https://phabricator.wikimedia.org/T149410
@lubianat do you have a test that expose this particular issue on the latest release?
@dpriskorn sorry, I did not do my homework well and did not document the details.
I kind of got one patch working for me and did not have the time to properly fix it upstream.
Are you also touching mediainfo?
Maybe this issue can be closed — it could very well have been a bug in my code, and not on WBI.
I know that I am using my fork and it works for me (and installing from pip does not).
This one, btw: https://github.com/lubianat/WikibaseIntegrator
No, and I'm considering whether would be a good idea to deprecate support for MediaInfo until it gets covered by a stable interface policy. See #840