qlever icon indicating copy to clipboard operation
qlever copied to clipboard

Scores from words file are not used for ql_textscore computation

Open aindlq opened this issue 1 year ago • 2 comments

Looks like, when doing text search with ql:contains-entity and ql:contains-word, ql_textscore_* variable has simply number of matching documents per entity, but it doesn't take score column from words file into account.

From the documentation:

The SCORE(?text) returns the number of matching records (sums of the score in the wordsfile, see above).

For me looks like a bug, because ordering by "real" score is extremely useful. What is the expected behavior?

aindlq avatar Nov 02 '23 20:11 aindlq

@NickG-1 is currently working on a thorough refactoring of the text index, that also exports the real score. However we currently (at least temporarily) will drop the TEXTLIMIT feature (it doesn't quite fit in the SPARQL standard and we also don't find ourselves using it very often). Would that be an issue for you?

joka921 avatar Nov 23 '23 14:11 joka921

@joka921 thank you for the update! That is good know.

I didn't use it so far exactly because it is non-standard SPARQL extension and all our tooling expects standard sparql on various levels of the system. So having it specified with magical predicate is much more preferable then with non-standard TEXTLIMIT.

In my view something like text limit is necessary to have at some point in time for sure, because otherwise one can get into troubles with search queries that returns too many documents.

For example in our dataset about works of art queering for "anonymous" author or "Madonna" artworks will produce too many matched documents. But it is definitely not a showstopper.

Also just to add that when working with bigger documents, I think it is more convenient to get not the whole document text back, but rather just a matched document ID.

aindlq avatar Nov 24 '23 07:11 aindlq