graph-explorer
graph-explorer copied to clipboard
[Feature Request] Investigate if for SPARQL endpoints Service Description provides usefull statistics
Description
For large public sparql endpoints such as uniprot statistics gathering takes a long time. UniProt and other endpoints provide detailed statistics with VoID, which might be useful to avoid sending out a lot of statistics gathering queries.
Preferred Solution
Gather the maximum of data from a VoID response if it is available.
Additional Context
Rewriting the discovery queries on the server side to retrieve information from the /.well-known/void
graph instead.
Related Issues
Tasks
- [] implement VoID/service description parser
- [] determine if that contains enough data for aws graph-explorer
- [] document the behavior / trust level of the void data.
[!IMPORTANT] If you are interested in working on this issue or have submitted a pull request, please leave a comment.
[!TIP] Please use a 👍 reaction to provide a +1/vote.
This helps the community and maintainers prioritize this request.
I am interested in contributing this feature.
For example the starting query
SELECT ?predicate (COUNT(?predicate) as ?count) { [] ?predicate ?object FILTER(!isLiteral(?object))} GROUP BY ?predicate"
times out at the UniProt sparql endpoint.
PREFIX void:<http://rdfs.org/ns/void#>
PREFIX void_ext:<http://ldf.fi/void-ext#>
SELECT
?predicate (SUM(?perPredicateParitionCount) AS ?count)
{
?predicatePartition void:property ?predicate ;
void:triples ?perPredicateParitionCount .
MINUS {
?predicatePartition void_ext:datatypePartition ?datatype .
}
} GROUP BY ?predicate
Gives the same general results.
Interesting approach. I like the idea.
The first question that pops in my mind is, how universal is this? Can this request be used across all SPARQL endpoints?
There are certainly issues with the schema sync process that can cause timeouts. We are looking in to those. I'm going to add this approach as one of the things we try.
Feel free to create a PR for it. We love submissions 🤓