openrefine-wikibase
openrefine-wikibase copied to clipboard
Fetch qualifiers
There is currently no way to fetch qualifiers in the data extension API (or to refine during reconciliation). A syntax for such qualifiers should be picked and implemented.
Yes, that would be great. And we should be able to link the column for the fetch data in the WD schema for pushing back data.
I am not sure what you mean by "link the column". Do you mean using column groups? I don't see how column groups can be relied on in the WD schema.
What I meant is that if I could query quantifiers and references, than, they can also be push back. This makes a round trip (get the data, fill the blanks, push the data back).
Now, this can't be done since quantifiers and references can't be imported before.
Would this be why I'm having this issue? Sorry if the terminology is off -- perhaps I should have said "qualifier" instead of "flag" in the subject line…
No your issue is not linked to qualifiers - but it's also an interesting one, I replied there :)
Use case mentioned here by @mshd:
I would like to reconcile Wikidata with a certain qualifier. Is it that possible, if not, could you implement it?
Exampl
Set qualifier property to North Sumatera III. or give me all people which ever had a candidacy at this district.
I would love it. In my usecase I have annual data like "total revenue" and without fetching qualifiers it's really difficult to update only those with no data from a certain year.
Let me expand on the design questions that need to be resolved before this can be implemented. This issue can be understood in multiple ways:
- I want to fetch the qualifier values on all statements of a given property. For instance: give me all the years for which the total revenue is available on Wikidata.
- I want to fetch the qualifier values on statements of a given property with a given value. For instance, give me the "member of political party" qualifier of the "candidacy in election: 2014 Indonesian People's Representative Council election" statement.
- I want to fetch main statement values, but select the ones I care about by specifying qualifier values. Example: give me the total revenue of this company in 2018 (so, filtering all "total revenue" statements to only keep the ones with a "point in time":2018 qualifier).
- I want to fetch "candidacy in election" statements, fetching simultaneously the main statement value and the qualifier values, representing them in OpenRefine with a record-like structure. This seems difficult to implement in a natural way with the current protocol.
Possible syntaxes we could add to support these use cases (where P3602
is candidacy in election, P1111
is votes received and P768
is electoral district):
-
P3602#P1111
(allP1111
qualifiers on allP3602
statements) -
P3602=Q108816797#P1111
(allP1111
qualifiers onP3602=Q108816797
statements) -
P3602[P768=Q96984689]
(all main statement values onP3602
statements withP768=Q96984689
qualifier) - I do not see a clean way to implement this given the existing API.
Do you see other use cases not covered by these points? Which of those use cases would be useful to you?
Do you see other use cases not covered by these points? Which of those use cases would be useful to you?
Looks good to me.
Only if the qualifiers are not Items themselves, case 3 could look more complicated. I.e. in case of point in time, which could just be the year, but sometimes is a certain data. In wikidata I would use FILTER for the qualifier. As a workaround we could use case 1 and do the filtering in Open Refine later.
As a workaround we could use case 1 and do the filtering in Open Refine later.
The problem with 1. is that it would only fetch the qualifier values, not the main statement values, so it is not clear to me how you can use it to reimplement 2 or 3 by adding local filtering afterwards.
P3602[P768=Q96984689]
(all main statement values onP3602
statements withP768=Q96984689
qualifier)I do not see a clean way to implement this given the existing API.
Do you see other use cases not covered by these points? Which of those use cases would be useful to you?
@wetneb : fine for 1. and 2. But why not P3602#P768=Q96984689
for 3.? And for 4.: why not Pxxx#*
?
Regards, Antoine
As a workaround we could use case 1 and do the filtering in Open Refine later.
The problem with 1. is that it would only fetch the qualifier values, not the main statement values, so it is not clear to me how you can use it to reimplement 2 or 3 by adding local filtering afterwards.
I thought it would only work in multiple steps. In my case (total revenue and point in time) I would try:
- fetch all point in time values for total revenue
- filter in Open Refine all point in time values between 2017-00-00 and 2018-00-00
- fetch all main statements for those
But you are right. It would only work if I could use the values of a column as qualifiers in my query.
@antoine2711 for 4., the problem is not to find a syntax for it, but rather to see how it would fit in the protocol. At the moment, when the user requests a property, we can only return one column for it.
I guess one hacky workaround would be to let the user fetch the full JSON of the statements, and we would let them manipulate that themselves in OpenRefine. After all, there is a ton more fields we are not exposing (ranks, references…) and it is unlikely we can find a satisfactory syntax to fetch all those fields, so it would be good to have this fallback option for power users.
It would still be more convenient than having to query the Wikibase API directly.
The problem with 1. is that it would only fetch the qualifier values, not the main statement values
Oh! I see @wetneb. So, the problem is bring the structure in OR? Why couldn't 2 columns be brought at the same time? I understand it requires creating rows at 2 levels, the outer statements and the inner qualifiers. But still, is that so complicated?
Also, OR has a (not very functional) grouping of column, like what you get from importing XML or JSON. Could that mechanism be reused?
I write that because, for me, in all 4 scenarii, I would like the statement value AND the qualifier's property AND the value of the qualifier's property.
Regards, Antoine
All I can say is that I do not know how that should be implemented. Again, proposals and pull requests are welcome.
I guess one hacky workaround would be to let the user fetch the full JSON of the statements, and we would let them manipulate that themselves in OpenRefine.
That would be great in many ways. Because, we could expand the syntax to add @ and the source property, with the same logic.
For the access of that data, since all those query starts from a recon column, maybe add fields to the recon...
Or, in the new column, save the data as a new recondata object. It would save either recon or values, and the cell of the initial recon column (the element of the statement).
In the same logic, we could want to have columns of reconcialied property that could replace properties in the Wikidata schema.
So the recondata could have a type of statement value, statement property, qualifier property or qualifier value, source property, or source value.
Expanding this logic seams quite in phase with the wikibase généralisation (though another topic).
Sorry @wetneb and the others if I am OT with too much OpenRefine, it's just here the two are so link/dependant of each other in my view.
Regards, Antoine
I have just received a request via email from another user who would find this very helpful.
It would be very useful for data extension for Wikimedia Commons' structured data, as P170 is usually described with several qualifiers there.
I have just received a request via email from another user who would find this very helpful.
It would be very useful for data extension for Wikimedia Commons' structured data, as P170 is usually described with several qualifiers there.
That user is me :-) . I like @wetneb 's solution to enable loading full statement JSONs. This would solve many possible feature requests in one go :)