onyx icon indicating copy to clipboard operation
onyx copied to clipboard

When the file name contains single quotes, an error occurs in Document Set Syncing.

Open slovx2 opened this issue 9 months ago • 0 comments

I used the File Connector to upload files, one of which is named: Error 6000- Characters Aren't Positive Integers.pdf. The indexing process completed normally.

However, when I create a Document Set to import the content from the above document, the UI continuously shows that it is syncing.

The background error log shows:

05/08/2024 07:29:23 AM             index.py 175 : Error occurred getting chunk by Document ID FILE_CONNECTOR__98f48aa9-b9de-4852-b321-f4f66fbef794/Error 6000- Characters Aren't Positive Integers .pdf:
Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '257', 'Content-Type': 'application/json'}
Payload: {'yql': "select documentid from danswer_chunk_intfloat_multilingual_e5_small where document_id contains 'FILE_CONNECTOR__98f48aa9-b9de-4852-b321-f4f66fbef794/Error 6000- Characters Aren't Positive Integers .pdf'", 'timeout': '10s', 'offset': 0, 'hits': 128}
Status Code: 400
Response Content: {"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":0},"errors":[{"code":4,"summary":"Invalid query parameter","message":"Could not create query from YQL: query:L1:177 mismatched input 't' expecting {<EOF>, 'select', ';'}","stackTrace":"com.yahoo.processing.IllegalInputException: com.yahoo.search.yql.ProgramCompileException: query:L1:177 mismatched input 't' expecting {<EOF>, 'select', ';'}\n\tat com.yahoo.search.yql.YqlParser.parseYqlProgram(YqlParser.java:888)\n\tat com.yahoo.search.yql.YqlParser.parse(YqlParser.java:275)\n\tat com.yahoo.search.yql.MinimalQueryInserter.insertQuery(MinimalQueryInserter.java:95)\n\tat com.yahoo.search.yql.MinimalQueryInserter.search(MinimalQueryInserter.java:80)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.prelude.searcher.FieldCollapsingSearcher.search(FieldCollapsingSearcher.java:90)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.prelude.querytransform.PhrasingSearcher.search(PhrasingSearcher.java:60)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.prelude.statistics.StatisticsSearcher.search(StatisticsSearcher.java:235)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.search.handler.SearchHandler.searchAndFill(SearchHandler.java:348)\n\tat com.yahoo.search.handler.SearchHandler.search(SearchHandler.java:393)\n\tat com.yahoo.search.handler.SearchHandler.handleBody(SearchHandler.java:269)\n\tat com.yahoo.search.handler.SearchHandler.handle(SearchHandler.java:178)\n\tat com.yahoo.container.jdisc.ThreadedHttpRequestHandler.handle(ThreadedHttpRequestHandler.java:77)\n\tat com.yahoo.container.jdisc.ThreadedHttpRequestHandler.handleRequest(ThreadedHttpRequestHandler.java:87)\n\tat com.yahoo.container.jdisc.ThreadedRequestHandler$RequestTask.processRequest(ThreadedRequestHandler.java:191)\n\tat com.yahoo.container.jdisc.ThreadedRequestHandler$RequestTask.run(ThreadedRequestHandler.java:185)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\nCaused by: com.yahoo.search.yql.ProgramCompileException: query:L1:177 mismatched input 't' expecting {<EOF>, 'select', ';'}\n\tat com.yahoo.search.yql.ProgramParser$ErrorListener.syntaxError(ProgramParser.java:91)\n\tat org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)\n\tat org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)\n\tat org.antlr.v4.runtime.DefaultErrorStrategy.reportInputMismatch(DefaultErrorStrategy.java:327)\n\tat org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:139)\n\tat com.yahoo.search.yql.yqlplusParser.program(yqlplusParser.java:358)\n\tat com.yahoo.search.yql.ProgramParser.parseProgram(ProgramParser.java:111)\n\tat com.yahoo.search.yql.ProgramParser.parse(ProgramParser.java:122)\n\tat com.yahoo.search.yql.YqlParser.parseYqlProgram(YqlParser.java:886)\n\t... 29 more\n"}]}}
Exception: 400 Client Error: Bad Request for url: http://index:8081/search/
05/08/2024 07:29:23 AM            celery.py 170 : Failed to sync document set 4
Traceback (most recent call last):
  File "/app/danswer/document_index/vespa/index.py", line 168, in _get_vespa_chunk_ids_by_document_id
    res.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://index:8081/search/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/danswer/background/celery/celery.py", line 143, in sync_document_set_task
    _sync_document_batch(
  File "/app/danswer/background/celery/celery.py", line 128, in _sync_document_batch
    document_index.update(update_requests=update_requests)
  File "/app/danswer/document_index/vespa/index.py", line 839, in update
    for doc_chunk_id in _get_vespa_chunk_ids_by_document_id(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/document_index/vespa/index.py", line 181, in _get_vespa_chunk_ids_by_document_id
    raise requests.HTTPError(error_base) from e
requests.exceptions.HTTPError: Error occurred getting chunk by Document ID FILE_CONNECTOR__98f48aa9-b9de-4852-b321-f4f66fbef794/Error 6000- Characters Aren't Positive Integers .pdf

slovx2 avatar May 08 '24 07:05 slovx2