pyspur icon indicating copy to clipboard operation
pyspur copied to clipboard

Error: running PySpur with --sqlite and receiving error for RAG

Open erenirmak opened this issue 8 months ago • 2 comments

I ran PySpur with, pyspur serve --sqlite

and I configured a basic RAG, uploaded 64 PDF files but during the collection creation, received this error: Image

and I also investigated backend messages:

Image

is it a bug or did I do something wrong ? how can I solve this error ?

Thank you in advance.

erenirmak avatar Apr 04 '25 09:04 erenirmak

I tried to upload documents 5 by 5, and received the same error, but with more verbose output, which illuminated the situation:

2025-04-04 23:20:27.434 | DEBUG | pyspur.rag.document_collection:process_documents:69 - Parsing file 8/10: data/knowledge_bases/DC1/Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives- A Survey.pdf 2025-04-04 23:20:27.497 | ERROR | pyspur.rag.document_collection:process_documents:150 - Error processing documents: cannot access local variable 'v' where it is not associated with a value ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/proxi/llm/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/applications.py", line 113, in call await self.middleware_stack(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in call raise exc File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in call await self.app(scope, receive, _send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/cors.py", line 93, in call await self.simple_response(scope, receive, send, request_headers=headers) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/cors.py", line 144, in simple_response await self.app(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 715, in call await self.middleware_stack(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 735, in app await route.handle(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 460, in handle await self.app(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/applications.py", line 113, in call await self.middleware_stack(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in call raise exc File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in call await self.app(scope, receive, _send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 715, in call await self.middleware_stack(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 735, in app await route.handle(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle await self.app(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 76, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/home/proxi/llm/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/routing.py", line 74, in app await response(scope, receive, send) File "/home/proxi/llm/lib/python3.11/site-packages/starlette/responses.py", line 158, in call await self.background() File "/home/proxi/llm/lib/python3.11/site-packages/starlette/background.py", line 41, in call await task() File "/home/proxi/llm/lib/python3.11/site-packages/starlette/background.py", line 26, in call await self.func(*self.args, **self.kwargs) File "/home/proxi/llm/lib/python3.11/site-packages/pyspur/rag/document_collection.py", line 82, in process_documents text = extract_text_from_file( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pyspur/rag/parser.py", line 84, in extract_text_from_file extracted_text = " ".join([page.extract_text() for page in reader.pages]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pyspur/rag/parser.py", line 84, in extracted_text = " ".join([page.extract_text() for page in reader.pages]) ^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_page.py", line 2393, in extract_text return self._extract_text( ^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_page.py", line 1868, in _extract_text cmaps[f] = build_char_map(f, space_width, obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_cmap.py", line 33, in build_char_map font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_cmap.py", line 56, in build_char_map_from_dict encoding, map_dict = get_encoding(ft) ^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_cmap.py", line 129, in get_encoding map_dict, int_entry = _parse_to_unicode(ft) ^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_cmap.py", line 212, in _parse_to_unicode return _type1_alternative(ft, map_dict, int_entry) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/proxi/llm/lib/python3.11/site-packages/pypdf/_cmap.py", line 530, in _type1_alternative map_dict[chr(i)] = v ^ UnboundLocalError: cannot access local variable 'v' where it is not associated with a value

The error is coming from the pypdf backend. I looked at the PDF, it has some pictures but it is not scanned PDF. It has text but it causes this error. For you to replicate the error, I am adding the aforementioned PDF as an attachment.

Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives- A Survey.pdf

erenirmak avatar Apr 05 '25 07:04 erenirmak