code-gov-api icon indicating copy to clipboard operation
code-gov-api copied to clipboard

Elastic Search appears to not be returning text in tags

Open JustinGOSSES opened this issue 5 years ago • 3 comments

Is your feature request related to a problem? Please describe. When I search via GUI or API for text I know is in a tag for a code project but not in the description text, I do not get said code project. I suspect this is because elastic search is not including tags in the text search????

Describe the solution you'd like I would like search to return text results even if text is only in a tag. This will likely require changing elastic search to include text in tag field of input JSONs.

Describe alternatives you've considered Alternatively, users could potentially select whether the search functionality is applied to title, description, tag, or everything.

Additional context I work on code.nasa.gov. We have greatly expanded the number of tags on each code project through natural language processing. Many A.I.-generated tags are higher order concepts that (1) don't appear in the description text (2) are more likely to pop into users' heads as something they'd want to find.

Value to end-user Users will be more likely to find relevant code projects if we can match their higher order conceptual needs to be tag text.

JustinGOSSES avatar Apr 23 '19 21:04 JustinGOSSES

Thanks for the issue. Can you give an example for tags that are showing in search? I believe that the tags are searchable, but @bjbhattGSA can clarify.

saracope avatar Apr 23 '19 21:04 saracope

As an example:

POSTPROC is a NASA code project. It is listed in the code.json that code.gov harvests. Link to the code.json is here: https://raw.githubusercontent.com/nasa/Open-Source-Catalog/master/code.json . http://code.nasa.gov/code.json resolves to that link.

It has a tag of crew and lift support.

When I click NASA as agency and search for crew and lift support, I don't get POSTPROC back as a result.

I know POSTPROC is a project on code.gov as it comes up when I use POSTPROCas the search term.

This is an example of a tag that exists in the code.json but doesn't get retuned by the search in code.gov.

JustinGOSSES avatar Apr 24 '19 14:04 JustinGOSSES

The search does take into account the tags. The search uses a multi_match query type. I suspect that the weight given to the fields needs to be tweaked to return better results for exact matches on fields.

I think this can be done with the weights on the keyword fields.

You can see this here in the code-gov-adapter/libs/elasticsearch/utils.js#L146

The API has a terms endpoint where you can search for specific terms and term types. Assuming that crew and lift support is really meant to be crew and life support there seems to be 26 instances of the term. There wasn't an exact match when searching normally, so the issue still stands.

To take a look at the terms you can run:

curl "https://api.code.gov/v2/terms?source_id=1433&term_type=tags&term=crew%20and%20life%20support" \
     -H 'X-API-KEY: DEMO_KEY'

CC: @saracope

froi avatar Nov 21 '19 06:11 froi