onyx
onyx copied to clipboard
Not reranked docs have no score --> Constraint Violation
When enabling the semantic re-ranker I get the following error in the chat mode:
File "/app/danswer/chat/process_message.py", line 330, in stream_chat_message
reference_db_search_docs = [
^
File "/app/danswer/chat/process_message.py", line 331, in <listcomp>
create_db_search_doc(server_search_doc=top_doc, db_session=db_session)
File "/app/danswer/db/chat.py", line 654, in create_db_search_doc
db_session.commit()
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1906, in commit
trans.commit(_to_root=True)
File "<string>", line 2, in commit
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
ret_value = fn(self, *arg, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1221, in commit
self._prepare_impl()
File "<string>", line 2, in _prepare_impl
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
ret_value = fn(self, *arg, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1196, in _prepare_impl
self.session.flush()
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4154, in flush
self._flush(objects)
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4290, in _flush
with util.safe_reraise():
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 147, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4251, in _flush
flush_context.execute()
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 467, in execute
rec.execute(self)
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 644, in execute
util.preloaded.orm_persistence.save_obj(
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
_emit_insert_statements(
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1223, in _emit_insert_statements
result = connection.execute(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1413, in execute
return meth(
^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 483, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1637, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1987, in _exec_single_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2344, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1968, in _exec_single_context
self.dialect.do_execute(
File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 920, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "score" of relation "search_doc" violates not-null constraint
The reason for that is that the function rerank_chunks
(https://github.com/danswer-ai/danswer/blob/aa7c811a9ace1244f5f6aaf5d4bb8e1a0ea1b5bb/backend/danswer/search/search_runner.py#L440) sets the score of retrieved documents that were not re-ranked to None. Later, the chat flow inserts them as search docs into the DB, where the constraint violation exception is triggered.
My current fix is to drop the search docs which exceed the number query.num_rerank
in the function rerank_chunks
because I don't think they are really useful anyways. Is that a good solution?