glitter icon indicating copy to clipboard operation
glitter copied to clipboard

a function to get labels based on ID

Open lvaudor opened this issue 2 years ago • 4 comments

Hi,

For sequins I have been working on a get_label() function:

#' This function takes a component of a triple pattern as input and returns (if it exists) a corresponding human-readable label.
#' @param string the string (a part of a triple pattern) to label
#' @param language the language in which to return the label (defaults to "en")
#' @param endpoint the SPARQL endpoint that is being queried (defaults to "wikidata")
#' @param label_property the name of the labelling property, for instance "skos:prefLabel". Defaults to "rdfs:label". If the endpoint is one of the usual glitter endpoints (see glitter::usual_endpoints) the labelling property is set accordingly.
#' @return the label corresponding to the string
#' @export
get_label=function(string, language="en",endpoint="wikidata", label_property="rdfs:label"){
  if(endpoint %in% glitter::usual_endpoints$name){
    index_endpoint=which(glitter::usual_endpoints$name==endpoint)
    label_property=glitter::usual_endpoints$label_property[index_endpoint]
  }
  if(!glitter:::is_prefixed(string)){
    return(string)
  }
  string=glitter:::str_replace(string,
                               "(^wdt\\:)|(^p\\:)|(^ps\\:)|(^pq\\:)",
                               "wd:")
  result=glitter::spq_init(endpoint=endpoint) %>% 
    glitter::spq_add(glue::glue("{string} {label_property} ?string_label")) %>% 
    glitter::spq_mutate(languages=lang(string_label)) %>% 
    glitter::spq_perform() %>% 
    dplyr::filter(languages==language) %>% # because I don't know how to make glitter::spq_filter work here
    .$string_label
  if(length(result)==0){return(string)}
  return(result)
}

It's supposed to work on all endpoints but I'll admit that right now my only examples which make much sense are on Wikidata...

Examples:

get_label("wd:Q152088",language="en") # returns "French fries"
get_label("wd:Q152088",language="fr") # returns "frite"
get_label("wdt:P31", language="fr") #returns "nature de l'élément"
get_label("'David Bowie'") # returns "'David Bowie'")
get_label("?item") # returns "?item"
get_label("hal:structure",endpoint="hal") # returns 'hal:structure' 

I'm wondering whether it should be included in glitter rather than sequins? What do you think?

lvaudor avatar Oct 11 '23 14:10 lvaudor

It's supposed to work on all endpoints but I'll admit that right now my only examples which make much sense are on Wikidata...

Because other endpoints have readable properties?

maelle avatar Oct 12 '23 09:10 maelle

Well, it would make sense if they did but I think they generally don't :-(. Maybe dbpedia could gather data about owl vocabularies? haven't had the time to check it though

lvaudor avatar Oct 12 '23 09:10 lvaudor

In that sense (if it's only relevant for Wikidata) it's similar to some functions you just removed BUT on the other hand not that much because at least it's not based on external packages

lvaudor avatar Oct 12 '23 09:10 lvaudor

could it live in a third package?

  • glitter for query building
  • sequin for query visualization
  • with Wikidata util?

maelle avatar Oct 19 '23 08:10 maelle