lnx icon indicating copy to clipboard operation
lnx copied to clipboard

400 status code / invalid query when using the `"` character

Open mosheduminer opened this issue 2 years ago • 5 comments

Hitting the indexes/{index}/search endpoint with a query with a " character inside:

{
  "query": {
    "normal": {
      "ctx": "test\""
    }
  },
}

results in the response

{"status":400,"data":"invalid query: SyntaxError(\"test\\\"\")"}

Maybe there's a decoding bug on my end? If so, it may the HTTP library I'm using. I'm using the docker image.

mosheduminer avatar Jan 25 '23 00:01 mosheduminer

Hello! Sorry for the long response, I didn't see the notification :)

The issue is because your query is expecting a closing ", the parser will try treat it as a phrase query so you need "hello world" to match exactly hello world but just " on its own isn't a valid query syntax (see https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)

ChillFish8 avatar Feb 15 '23 17:02 ChillFish8

Hi @ChillFish8! Thanks for the response. To clarify, does that mean there is no way to match text with quotes?

I'm asking because I have many texts where " is in middle of a word, and this is expected for the texts I am dealing with (they are used to indicate that the word is a contraction of multiple words, similar to how ' is used in English for words like didn't).

mosheduminer avatar Feb 15 '23 17:02 mosheduminer

I guess I should open an issue requesting the ability to escape quotes in the tantivy repo?

mosheduminer avatar Feb 15 '23 17:02 mosheduminer

Thanks for the response. To clarify, does that mean there is no way to match text with quotes?

So technically you could support it in the parser, but it won't behave how you expect it to.

Under the hood words like that will be split up so say I had didn't or test"ing they'll be split into didn, t and test, ing The tokenizer will remove any special characters like that.

ChillFish8 avatar Feb 15 '23 17:02 ChillFish8

If you're looking for a specific word and don't want that behaviour you'd need to use the string field type which doesn't do any tokenizing and then match for the entire value using a term query.

ChillFish8 avatar Feb 15 '23 17:02 ChillFish8