Error: running PySpur with --sqlite and receiving error for RAG
I ran PySpur with, pyspur serve --sqlite
and I configured a basic RAG, uploaded 64 PDF files but during the collection creation, received this error:
and I also investigated backend messages:
is it a bug or did I do something wrong ? how can I solve this error ?
Thank you in advance.
I tried to upload documents 5 by 5, and received the same error, but with more verbose output, which illuminated the situation:
2025-04-04 23:20:27.434 | DEBUG | pyspur.rag.document_collection:process_documents:69 - Parsing file 8/10: data/knowledge_bases/DC1/Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives- A Survey.pdf
2025-04-04 23:20:27.497 | ERROR | pyspur.rag.document_collection:process_documents:150 - Error processing documents: cannot access local variable 'v' where it is not associated with a value
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/proxi/llm/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 412, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proxi/llm/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proxi/llm/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/applications.py", line 113, in call
await self.middleware_stack(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in call
raise exc
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in call
await self.app(scope, receive, _send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/cors.py", line 93, in call
await self.simple_response(scope, receive, send, request_headers=headers)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/cors.py", line 144, in simple_response
await self.app(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 715, in call
await self.middleware_stack(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 460, in handle
await self.app(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/applications.py", line 113, in call
await self.middleware_stack(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in call
raise exc
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in call
await self.app(scope, receive, _send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 715, in call
await self.middleware_stack(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 74, in app
await response(scope, receive, send)
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/responses.py", line 158, in call
await self.background()
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/background.py", line 41, in call
await task()
File "/home/proxi/llm/lib/python3.11/site-packages/starlette/background.py", line 26, in call
await self.func(*self.args, **self.kwargs)
File "/home/proxi/llm/lib/python3.11/site-packages/pyspur/rag/document_collection.py", line 82, in process_documents
text = extract_text_from_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proxi/llm/lib/python3.11/site-packages/pyspur/rag/parser.py", line 84, in extract_text_from_file
extracted_text = " ".join([page.extract_text() for page in reader.pages])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proxi/llm/lib/python3.11/site-packages/pyspur/rag/parser.py", line 84, in
The error is coming from the pypdf backend. I looked at the PDF, it has some pictures but it is not scanned PDF. It has text but it causes this error. For you to replicate the error, I am adding the aforementioned PDF as an attachment.