sqlite3 error when running Fast Start RAG, example 1 on Windows 10
Hi, I'm running the first Fast Start RAG example example-1-create_first_library.py on Windows 10 and I get this error.
Example - Parsing Files into Library
Step 1 - creating library example1_library
INFO: Setup - sample_files path already exists - C:\Users\Käyttäjä\llmware_data\sample_files
Step 2 - loading the llmware sample files and saving at: C:\Users\Käyttäjä\llmware_data\sample_files
Step 3 - parsing and indexing files from C:\Users\Käyttäjä\llmware_data\sample_files\Agreements
INFO: update: Duplicate files (skipped): 0
INFO: update: Total uploaded: 15
INFO: Parser - parse_pdf - start parsing of PDF Documents...
WARNING: pdf_parser - update_library_inc_totals_sqlite - can not open database: unable to open database file
WARNING: pdf_parser - register_status_update_sqlite - can not open database: unable to open database file
INFO: pdf_parser - total pdf files processed - 0
INFO: pdf_parser - total input files received - 0
INFO: pdf_parser - total blocks created - 0
INFO: pdf_parser - total images created - 0
INFO: pdf_parser - total tables created - 0
INFO: pdf_parser - total pages added - 0
INFO: pdf_parser - PDF Processing - Finished - time elapsed - 0.010000
INFO: pdf_parser - Completed Parsing - processing time - 0.010000
INFO: Parser - parse_pdf - completed parsing of pdf documents - time taken: 0.030765771865844727
Step 4 - completed parsing - {'docs_added': 0, 'blocks_added': 0, 'images_added': 0, 'pages_added': 0, 'tables_added': 0, 'rejected_files': ['Rhea EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Artemis Poseidon EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Aphrodite EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Leto EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Eileithyia EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Nyx EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Gaia EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Demeter EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Amphitrite EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Persephone EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Apollo EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Nike EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Athena EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Bia EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Metis EXECUTIVE EMPLOYMENT AGREEMENT.pdf']}
Step 5 - updated library card - documents - 0 - blocks - 0 - {'_id': 1, 'library_name': 'example1_library', 'embedding': [{'embedding_status': 'no', 'embedding_model': 'none', 'embedding_db': 'none', 'embedded_blocks': 0, 'embedding_dims': 0, 'time_stamp': 'NA'}], 'knowledge_graph': 'no', 'unique_doc_id': 0, 'documents': 0, 'blocks': 0, 'images': 0, 'pages': 0, 'tables': 0, 'account_name': 'llmware'}
Step 6 - library artifacts - including extracted images - saved at folder path - C:\Users\Käyttäjä\llmware_data\accounts\llmware\example1_library
Step 7 - running a test query - base salary
First time I ran the file it was like this it ended like this
Step 7 - running a test query - base salary
Traceback (most recent call last):
File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 666, in _exitfunc
f()
File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 590, in __call__
return info.func(*info.args, **(info.kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Käyttäjä\Desktop\Coding\Data Science\llmware-rag\.venv\Lib\site-packages\urllib3\connectionpool.py", line 1180, in _close_pool_connections
conn.close()
File "C:\Users\Käyttäjä\Desktop\Coding\Data Science\llmware-rag\.venv\Lib\site-packages\botocore\awsrequest.py", line 80, in close
super().close()
File "C:\Users\Käyttäjä\Desktop\Coding\Data Science\llmware-rag\.venv\Lib\site-packages\urllib3\connection.py", line 318, in close
super().close()
File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1003, in close
sock.close() # close it manually... there may be other refs
^^^^^^^^^^^^
File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 504, in close
self._real_close()
File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\ssl.py", line 1308, in _real_close
super()._real_close()
_ss.close(self)
The code did create a sqlite_llmware.db` file and I can work with it in the terminal
>>> conn = sqlite3.connect(os.path.join(LLMWareConfig().get_library_path(), 'sqlite_llmware.db'))
>>> cur = conn.cursor()
>>> cur.execute("""SELECT name FROM sqlite_master WHERE type='table';""")
<sqlite3.Cursor object at 0x000001EF59891D40>
>>> print(cur.fetchall())
[('library',), ('example1_library',), ('example1_library_data',), ('example1_library_idx',), ('example1_library_content',), ('example1_library_docsize',), ('example1_library_config',), ('parser_events',), ('parser_events_data',), ('parser_events_idx',), ('parser_events_content',), ('parser_events_docsize',), ('parser_events_config',), ('status',), ('movie',)]
(Note that I added ('movie',) in testing.)
@arautio89 - sorry you ran into this issue - thanks for sharing it - it does seem unusual. It looks like you have been able to parse other documents and create other libraries with llmware in that specific database? I would recommend checking the obvious stuff so we can rule those out (e.g., that the files or DB were not corrupted, and that you have the pip dependencies installed) - and then please do try to run the example again with a clean setup - if the issue persists, then we can look at the next level of debugging ... If there are any other notable items about the environment, please share - and I will try to recreate the environment and see if we can reproduce the issue.
None of the sample folders were successfully parsed, although SmallLibrary stopped with different error:
Example - Parsing Files into Library
Step 1 - creating library example1_library
INFO: Setup - sample_files path already exists - C:\Users\Käyttäjä\llmware_data\sample_files
Step 2 - loading the llmware sample files and saving at: C:\Users\Käyttäjä\llmware_data\sample_files
Step 3 - parsing and indexing files from C:\Users\Käyttäjä\llmware_data\sample_files\SmallLibrary
INFO: update: Duplicate files (skipped): 2
INFO: update: Total uploaded: 6
INFO: Parser - parse_office - start parsing of office documents...
I guess all of the pdf files couldn't be parsed and were rejected, and then office document parsing exited with error?
I created a virtual environment with Python 3.12 and installed with pip3 install 'llmware[full].
However to make that work I had to download visual-cpp-build-tools and select two individual components like in this post:
https://stackoverflow.com/a/76245995
I chose Windows 10 SDK.
I could try again from the beginning with different Python version and see what happens?
Looking at the code the I suspect the problem is somewhere in libpdf_llmware.dll, however I don't know how to troubleshoot further than that.
@arautio89 This issue is already resolved, right?
@arautio89 This issue is already resolved, right?
I tried again today with llmware==0.4.1 and Python 3.12, and still the same issue.
@arautio89 I think it is working with the same requirements you defined. You might be experiencing the issue for a different reason. Can you tell me is there any issue installing llmware?
None of the sample folders were successfully parsed, although
SmallLibrarystopped with different error:Example - Parsing Files into Library Step 1 - creating library example1_library INFO: Setup - sample_files path already exists - C:\Users\Käyttäjä\llmware_data\sample_files Step 2 - loading the llmware sample files and saving at: C:\Users\Käyttäjä\llmware_data\sample_files Step 3 - parsing and indexing files from C:\Users\Käyttäjä\llmware_data\sample_files\SmallLibrary INFO: update: Duplicate files (skipped): 2 INFO: update: Total uploaded: 6 INFO: Parser - parse_office - start parsing of office documents...I guess all of the pdf files couldn't be parsed and were rejected, and then office document parsing exited with error?
I created a virtual environment with Python 3.12 and installed with
pip3 install 'llmware[full]. However to make that work I had to download visual-cpp-build-tools and select two individual components like in this post: https://stackoverflow.com/a/76245995 I chose Windows 10 SDK.I could try again from the beginning with different Python version and see what happens?
@arautio89 Can you conform that you followed each step not just the basic installation, but also checking and installing the required packages afterward? like in this video for reference - https://youtu.be/rRBrKn8vvjc?si=qzepksRk4HIcMJFg