llmware icon indicating copy to clipboard operation
llmware copied to clipboard

sqlite3 error when running Fast Start RAG, example 1 on Windows 10

Open arautio89 opened this issue 1 year ago • 7 comments

Hi, I'm running the first Fast Start RAG example example-1-create_first_library.py on Windows 10 and I get this error.

Example - Parsing Files into Library

Step 1 - creating library example1_library
INFO: Setup - sample_files path already exists - C:\Users\Käyttäjä\llmware_data\sample_files
Step 2 - loading the llmware sample files and saving at: C:\Users\Käyttäjä\llmware_data\sample_files
Step 3 - parsing and indexing files from C:\Users\Käyttäjä\llmware_data\sample_files\Agreements
INFO: update:  Duplicate files (skipped): 0
INFO: update:  Total uploaded: 15
INFO: Parser - parse_pdf - start parsing of PDF Documents...
WARNING: pdf_parser - update_library_inc_totals_sqlite - can not open database: unable to open database file
WARNING: pdf_parser - register_status_update_sqlite - can not open database: unable to open database file
INFO: pdf_parser - total pdf files processed - 0
INFO: pdf_parser - total input files received - 0
INFO: pdf_parser - total blocks created - 0
INFO: pdf_parser - total images created - 0
INFO: pdf_parser - total tables created - 0
INFO: pdf_parser - total pages added - 0
INFO: pdf_parser - PDF Processing - Finished - time elapsed - 0.010000 
INFO: pdf_parser - Completed Parsing - processing time - 0.010000
INFO: Parser - parse_pdf - completed parsing of pdf documents - time taken: 0.030765771865844727
Step 4 - completed parsing - {'docs_added': 0, 'blocks_added': 0, 'images_added': 0, 'pages_added': 0, 'tables_added': 0, 'rejected_files': ['Rhea EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Artemis Poseidon EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Aphrodite EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Leto EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Eileithyia EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Nyx EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Gaia EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Demeter EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Amphitrite EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Persephone EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Apollo EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Nike EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Athena EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Bia EXECUTIVE EMPLOYMENT AGREEMENT.pdf', 'Metis EXECUTIVE EMPLOYMENT AGREEMENT.pdf']}
Step 5 - updated library card - documents - 0 - blocks - 0 - {'_id': 1, 'library_name': 'example1_library', 'embedding': [{'embedding_status': 'no', 'embedding_model': 'none', 'embedding_db': 'none', 'embedded_blocks': 0, 'embedding_dims': 0, 'time_stamp': 'NA'}], 'knowledge_graph': 'no', 'unique_doc_id': 0, 'documents': 0, 'blocks': 0, 'images': 0, 'pages': 0, 'tables': 0, 'account_name': 'llmware'}
Step 6 - library artifacts - including extracted images - saved at folder path - C:\Users\Käyttäjä\llmware_data\accounts\llmware\example1_library

Step 7 - running a test query - base salary

First time I ran the file it was like this it ended like this

Step 7 - running a test query - base salary

Traceback (most recent call last):
  File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 666, in _exitfunc
    f()
  File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Käyttäjä\Desktop\Coding\Data Science\llmware-rag\.venv\Lib\site-packages\urllib3\connectionpool.py", line 1180, in _close_pool_connections
    conn.close()
  File "C:\Users\Käyttäjä\Desktop\Coding\Data Science\llmware-rag\.venv\Lib\site-packages\botocore\awsrequest.py", line 80, in close
    super().close()
  File "C:\Users\Käyttäjä\Desktop\Coding\Data Science\llmware-rag\.venv\Lib\site-packages\urllib3\connection.py", line 318, in close
    super().close()
  File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1003, in close
    sock.close()   # close it manually... there may be other refs
    ^^^^^^^^^^^^
  File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 504, in close
    self._real_close()
  File "C:\Users\Käyttäjä\AppData\Local\Programs\Python\Python312\Lib\ssl.py", line 1308, in _real_close
    super()._real_close()
    _ss.close(self)

The code did create a sqlite_llmware.db` file and I can work with it in the terminal

>>> conn = sqlite3.connect(os.path.join(LLMWareConfig().get_library_path(), 'sqlite_llmware.db'))
>>> cur = conn.cursor()
>>> cur.execute("""SELECT name FROM sqlite_master WHERE type='table';""")
<sqlite3.Cursor object at 0x000001EF59891D40>
>>> print(cur.fetchall())
[('library',), ('example1_library',), ('example1_library_data',), ('example1_library_idx',), ('example1_library_content',), ('example1_library_docsize',), ('example1_library_config',), ('parser_events',), ('parser_events_data',), ('parser_events_idx',), ('parser_events_content',), ('parser_events_docsize',), ('parser_events_config',), ('status',), ('movie',)]

(Note that I added ('movie',) in testing.)

arautio89 avatar Dec 01 '24 18:12 arautio89

@arautio89 - sorry you ran into this issue - thanks for sharing it - it does seem unusual. It looks like you have been able to parse other documents and create other libraries with llmware in that specific database? I would recommend checking the obvious stuff so we can rule those out (e.g., that the files or DB were not corrupted, and that you have the pip dependencies installed) - and then please do try to run the example again with a clean setup - if the issue persists, then we can look at the next level of debugging ... If there are any other notable items about the environment, please share - and I will try to recreate the environment and see if we can reproduce the issue.

doberst avatar Dec 02 '24 17:12 doberst

None of the sample folders were successfully parsed, although SmallLibrary stopped with different error:

Example - Parsing Files into Library

Step 1 - creating library example1_library
INFO: Setup - sample_files path already exists - C:\Users\Käyttäjä\llmware_data\sample_files
Step 2 - loading the llmware sample files and saving at: C:\Users\Käyttäjä\llmware_data\sample_files
Step 3 - parsing and indexing files from C:\Users\Käyttäjä\llmware_data\sample_files\SmallLibrary
INFO: update:  Duplicate files (skipped): 2
INFO: update:  Total uploaded: 6
INFO: Parser - parse_office - start parsing of office documents...

I guess all of the pdf files couldn't be parsed and were rejected, and then office document parsing exited with error?

I created a virtual environment with Python 3.12 and installed with pip3 install 'llmware[full].
However to make that work I had to download visual-cpp-build-tools and select two individual components like in this post: https://stackoverflow.com/a/76245995 I chose Windows 10 SDK.

I could try again from the beginning with different Python version and see what happens?

arautio89 avatar Dec 02 '24 18:12 arautio89

Looking at the code the I suspect the problem is somewhere in libpdf_llmware.dll, however I don't know how to troubleshoot further than that.

arautio89 avatar Dec 03 '24 19:12 arautio89

@arautio89 This issue is already resolved, right?

rahulsamant37 avatar Apr 17 '25 23:04 rahulsamant37

@arautio89 This issue is already resolved, right?

I tried again today with llmware==0.4.1 and Python 3.12, and still the same issue.

arautio89 avatar Apr 18 '25 12:04 arautio89

@arautio89 I think it is working with the same requirements you defined. You might be experiencing the issue for a different reason. Can you tell me is there any issue installing llmware?

rahulsamant37 avatar Apr 18 '25 16:04 rahulsamant37

None of the sample folders were successfully parsed, although SmallLibrary stopped with different error:

Example - Parsing Files into Library

Step 1 - creating library example1_library
INFO: Setup - sample_files path already exists - C:\Users\Käyttäjä\llmware_data\sample_files
Step 2 - loading the llmware sample files and saving at: C:\Users\Käyttäjä\llmware_data\sample_files
Step 3 - parsing and indexing files from C:\Users\Käyttäjä\llmware_data\sample_files\SmallLibrary
INFO: update:  Duplicate files (skipped): 2
INFO: update:  Total uploaded: 6
INFO: Parser - parse_office - start parsing of office documents...

I guess all of the pdf files couldn't be parsed and were rejected, and then office document parsing exited with error?

I created a virtual environment with Python 3.12 and installed with pip3 install 'llmware[full]. However to make that work I had to download visual-cpp-build-tools and select two individual components like in this post: https://stackoverflow.com/a/76245995 I chose Windows 10 SDK.

I could try again from the beginning with different Python version and see what happens?

@arautio89 Can you conform that you followed each step not just the basic installation, but also checking and installing the required packages afterward? like in this video for reference - https://youtu.be/rRBrKn8vvjc?si=qzepksRk4HIcMJFg

rahulsamant37 avatar Apr 18 '25 16:04 rahulsamant37