Get a dump of everything but scientific articles
Thanks for this great tool!
I would be interested in generating a dump of all wikidata items, except those which have P31:Q13442814. It's not clear to me if this is doable yet?
Scholarly articles are at https://www.wikidata.org/wiki/Q13442814 https://tools.wmflabs.org/scholia/ has statistics about the amount of triples you'd save on excluding them. It would be only 3% of all triples ... - Still a feature to filter out certain entities might be worthwhile.
It would be only 3% of all triples ...
Are you sure about this? Where do you see this figure? Scholia does announce 11,186,800,006 Wikidata triples but I don't see a figure for the number of triples for scientific articles? I expect that to be much more than 3%…
35718600 | Scholarly articles it says... - yes you are right the number of triples with all properties will be higher than 3% then.
Any progress on this?
Some more stats links:
- https://grafana.wikimedia.org/d/000000175/wikidata-datamodel-statements?orgId=1&refresh=30m
- https://www.wikidata.org/wiki/Wikidata:Statistics/Wikipedia
ScholarlyArticle and Astronomical object are interesting subsets, both to extract and keep, or to exclude, depending on purpose.