explicit-semantic-analysis icon indicating copy to clipboard operation
explicit-semantic-analysis copied to clipboard

Lucene: exception - Query parser encountered <EOF> after “some word”

Open stroncod opened this issue 6 years ago • 1 comments

I got a problem when trying to read a dataset with special characters and trying to get the concept vector. This is easily solve by adding the escape function in the Vectorizer class

public ConceptVector vectorize(String text) throws ParseException, IOException {
        Query query = queryParser.parse(**QueryParser.escape(text)**);
        TopDocs td = searcher.search(query, conceptCount);
        return new ConceptVector(td, indexReader);
    }

Great implementation by the way! Thanks

Source: https://stackoverflow.com/questions/10259907/lucene-exception-query-parser-encountered-eof-after-some-word/10259944

stroncod avatar Jun 04 '18 19:06 stroncod

Thanks for using ESA, and even more for your feedback!

text is expected to be plain text, without control characters (such as quotes to combine multiple words into a single token), so I think your solution is correct.

Do you want to issue a pull request with the change and a unit test or two? Then your contribution will be carved into stone.

pvoosten avatar Jun 05 '18 05:06 pvoosten