SolrSearch icon indicating copy to clipboard operation
SolrSearch copied to clipboard

Modifying search query so one field scores higher over others?

Open AmandaUCSC opened this issue 8 years ago • 7 comments

Has anyone yet modified this plug-in's query so it scores a field higher over others? It should be possible according to the SolrRelevance FAQs wiki. Specifically, I want to do this:

How can I make "superman" in the title field score higher than in the subject field? For the standard request handler, "boost" the clause on the title field: q=title:superman^2 subject:superman Using the dismax request handler, one can specify boosts on fields in parameters such as qf: q=superman&qf=title^2 subject

I think what I need to do is somehow change the code in the ResultsController.php here:

  // Get the facet GET parameter
   $facet = $this->_request->facet;

    // Form the composite Solr query.
    if (!empty($facet)) $query .= " AND {$facet}";

    // Limit the query to public items if required
    if($limitToPublicItems) {
       $query .= ' AND public:"true"';
    }

    return $query;

Am I right? Has anyone already done this before?

AmandaUCSC avatar Mar 23 '17 15:03 AmandaUCSC

Ah, I think I figured out where I change things... really quite simple I think. I see the DisMax query parser in the solrconfig.xml file. I believe if I just modify things to how we want them in there, everything should work.

UPDATE: Well, I modified, successfully reloaded it into solr, and it still doesn't seem to work the way I want to (the search doesn't seem to have changed at all). The field I want to add is "identifier." So here is basically what I added:

text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 identifier^10.0

I am searching for an exact match in the identifier using a specific format. For the sake of simplicity, let's say it's called "Document(1900) No. 15". So when someone types that into the search box, the document with that identifier should be the very first to come up. At the moment it's not - it's about the fifth document to come up. The other documents mention this document in the text, and they are coming up prior to this one. Why would that be?

AmandaUCSC avatar Mar 24 '17 13:03 AmandaUCSC

Hi all, so I tried boosting the identifier more (^20) and I also tried moving it to the front of the line (before text) just to see if that made a difference and it didn't. Do I also need to change another file somewhere to reflect that I'm adding the identifier field here? I couldn't tell from the schema file if that was necessary or not.

AmandaUCSC avatar Mar 27 '17 11:03 AmandaUCSC

I've been working a bit on the SolrSearch_ResultsController::_getQuery() method myself, as it is neither fully allowing or fully escaping the Lucene query syntax. One thing I've discovered is that most metadata fields are indexed with unintuitive names in Solr. Basically, anything that you mark as "Is Indexed?" in the Solr Search plugin Field Configuration will be indexed in a field that is named <id>_t, where <id> is the key from the omeka_solr_search_fields table.

For example in one of my installations the Identifier field has an <id> of 48, so the actual field name in Solr is 48_t. So, I guess to increase its relevance you would have to add 48_t^10.0 to that configuration file, but I have not tested this. Also, your field may have a different <id> number.

You should also be able to query that field in your query string with 48_t:"Document(1900) No. 15", but the current SolrSearch plugin is replacing colons with spaces in all queries, so that wouldn't work.

Looking through other forks of the plugin, I did find this commit from @jajm that appears to give more intuitive names to the fields in Solr: https://github.com/biblibre/omeka-plugin-SolrSearch/commit/48ab77d9e97271c26b8f2415e75b679faad8f5b4

kloor avatar Mar 29 '17 20:03 kloor

Thanks -- and you're right. I actually figured out how to change that in the solrconfig.xml file yesterday and it worked (I was trying to do it for the Identifier field in our installation, which was 43_t ). I had to modify the default /select handler and added an edismax with qf. In the end it looked like this:

<str name="defType">edismax</str> <str name="qf">43_t^10.0 title^10.0 text^5.0</str>

I'll probably have to modify it again at some point based on additional criteria. But at least I figured out how to get this to work!

AmandaUCSC avatar Mar 29 '17 20:03 AmandaUCSC

Thanks for pointing out the EDisMax query parser. It performs much better than the standard query parser, and gracefully handles syntax issues like single quote problem reported in #137.

I've updated the SolrSearch_ResultsController::_getQuery() method in my fork of the SolrSearch package to use EDisMax: https://github.com/BGSU-LITS/SolrSearch/blob/master/controllers/ResultsController.php#L110

I specified using EDisMax in the query string instead of the solrconfig.xml file so it would work without having to change that file and reload the core. I think the qf parameter from the config file should still be respected, though.

The other changes I made to the method were to remove the parts that stripped characters from the query, and to add plus signs to the facets and public field so that they are required when using EDisMax. Without the plus signs, documents that did not match the facets could still be selected.

kloor avatar Mar 30 '17 14:03 kloor

Excellent! So should I get rid of the solrconfig.xml code I changed and just replace the SolrSearch_ResultsController::_getQuery() with the new version? (I had no idea about the plus sign issue...) Or do you mean to leave the config file qf parameter alone... I'm presuming no other methods or files were changed? Thanks!

AmandaUCSC avatar Mar 31 '17 12:03 AmandaUCSC

Right, I only changed SolrSearch_ResultsController::_getQuery(). You would probably want to keep your solrconfig.xml file for the qf parameter.

I was hesitant about specifying to use EDisMax in the code, as unlike the solrconfig.xml file, it wouldn't be configurable without editing the code. But, for people who want better query processing, it's easiest to just replace that function than to find and edit their solrconfig.xml file and reload the core.

kloor avatar Mar 31 '17 13:03 kloor