Examine icon indicating copy to clipboard operation
Examine copied to clipboard

Disable IDF (inverse document frequency) per field

Open mistyn8 opened this issue 6 years ago • 4 comments

{ Category: content, LuceneQuery: (hideFromSearch:0 +(__NodeTypeAlias:dtcontenttile) +(tileContentOrigination:external^31.0 tileContentOrigination:partner^32.0 tileContentOrigination:originator^33.0)) }

So I'm trying to artificially boost pages scoring by a type, however, beacuse the lowest boosted type is also the lowest by node count it's score is enhanced due to IDF so ends up first in the results and not last, is there anyway to alter that? ta.

mistyn8 avatar Oct 14 '19 16:10 mistyn8

Sounds like you know more about this subject than I do ;) I'm not really sure so if you feel like debugging into the cause (prob easiest with a unit test in the solution, there's plenty of examples to get started with) that would be great.

Shazwazza avatar Oct 14 '19 23:10 Shazwazza

Just a 20min google to try to understand how the scoring worked.. http://www.lucenetutorial.com/advanced-topics/scoring.html

Seems to suggest we can override the idf, though I'd have little idea how to do it.

Also found https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

But as lucenenet 3.0.3 is 5yrs ago, not sure if that means no bm25 support? I can't find anything to suggest what native lucene version equates to lucenenet version (bm25 I think started in lucene 6?)

mistyn8 avatar Oct 15 '19 10:10 mistyn8

@mistyn8 I look into bm25, it was introduced in Lucene Release 4.0.0, it means it is not available in older versions of Lucene.

bielu avatar Oct 25 '19 14:10 bielu

In solr, that is based out of lucene, you need to define a field type with a custom similarity class and use that type in the field

Something like

<fieldType name="custom_txt" class="solr.TextField" positionIncrementGap="100">
      <similarity class="com.MySimilarityClass"/>

The custom similarity class

import org.apache.lucene.search.similarities.ClassicSimilarity;

public class MySimilarityClass extends ClassicSimilarity {
    @Override
    public float idf(long docFreq, long numDocs) {
        return 1.0f;
    }
}

And the similiarty class can be overriden and imported as a library in your solrconfig.xml (create Java jar file and import it in your solr directory) <lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib/" regex=".*\.jar" />

captainjackrana avatar Nov 29 '23 12:11 captainjackrana