questioning_authority icon indicating copy to clipboard operation
questioning_authority copied to clipboard

simplify Getty queries

Open VladimirAlexiev opened this issue 10 years ago • 7 comments

Hi! We developed & maintain the Getty endpoint, and luc:term should only include terms (which includes pref & altLabels, minus any " (qualifier)". You can see the props that are navigated to collect FTS text here: http://vocab.getty.edu/doc/#FTS_Insert_Queries.

You write "The full text index matches on fields besides the term, so we filter to ensure the match is in the term" and do a REGEX on pref|altLabel, and then DISTINCT since there are multiple altLabels. This query is quite complex and a bit more expensive than it needs to be.

If you provide some testing examples, we'll fix the problem "matches on fields besides the term".

For AAT, you seem to want prefLabel only. I wrote in the support forum "I think that if we make an index by prefLabels only, that would resolve most problems. But is this what you need? Eg it won't find "frostbiting" aka "frostbite boating". If you want an extra index by prefLabel only, let me know (but it'll also have more languages than EN)

VladimirAlexiev avatar Aug 06 '15 11:08 VladimirAlexiev

BTW excellent project, I'll add it to "Getty usage stories"

VladimirAlexiev avatar Aug 06 '15 11:08 VladimirAlexiev

If you need to filter by regex, it would be faster to return 1 row per concept and use GROUP_CONCAT to put all altLabel in that row. This way you'll avoid multiple regex() checks per concepts, and DISTINCT. Eg:

SELECT ?s ?name ?bio  {
  {select ?s ?name ?bio (CONCAT(?name, ' ', GROUP_CONCAT(?alt)) as ?labels) {
              ?s a skos:Concept; luc:term "#{search}\";
                 skos:inScheme <http://vocab.getty.edu/ulan/> ;
                 gvp:prefLabelGVP [skosxl:literalForm ?name] ;
                 foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
                 skos:altLabel ?alt .
         } GROUP BY ?s ?name ?bio}
      filter(regex(?labels,"#{search}\","i"))}

VladimirAlexiev avatar Aug 06 '15 11:08 VladimirAlexiev

@VladimirAlexiev Thanks so much for the feedback. I'm not currently working on questioning_authority, but I'm hoping that another of our other consortium members will be able to incorporate your suggestions.

jcoyne avatar Aug 06 '15 12:08 jcoyne

@geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.

elrayle avatar Mar 04 '19 16:03 elrayle

Your query is similar to two others given in the documentation, so it's not too complex:

http://vocab.getty.edu/doc/queries/#Combination_Full-Text_and_Exact_String_Match

http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query

Still, you may want to evaluate those (especially the latter) as they may well give better results

On Mon, Mar 4, 2019, 16:28 E. Lynette Rayle [email protected] wrote:

@geekscruff https://github.com/geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samvera/questioning_authority/issues/84#issuecomment-469317812, or mute the thread https://github.com/notifications/unsubscribe-auth/AAguurVK4wZxPqnWVbXgx-0SF3GXF3rsks5vTUnAgaJpZM4Fmw_J .

VladimirAlexiev avatar Mar 04 '19 21:03 VladimirAlexiev

Thanks for the feedback @VladimirAlexiev

I think the following regex-free query returns the same results, but is much simpler. Would you mind having a look and seeing if you agree? The following example uses vinchi from the alt label.

SELECT DISTINCT ?s ?name ?bio {
  ?s a skos:Concept; 
      luc:term "leonardo AND da AND vinchi"; 
      skos:inScheme ulan: ;
      gvp:prefLabelGVP [xl:literalForm ?name];
      foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
      skos:altLabel ?alt .
} order by asc(lcase(str(?name)))

ghost avatar Mar 05 '19 08:03 ghost

I like that it doesn't have regex but AND gives too much freedom imho. I'd use the FTS query from http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query

VladimirAlexiev avatar Mar 07 '19 16:03 VladimirAlexiev