quivr
quivr copied to clipboard
PermissionError when I Add a PDF to Database
I think I followed all the instructions but once the streamlit runs I drag a PDF and when a click on Add to Database, this error is shown. Any idea?
THANK YOU !!!
Ouch something about windows probably 😬
Where did you install quiver and do you have access to the D folder mentioned ?
I think I followed all the instructions but once the streamlit runs I drag a PDF and when a click on Add to Database, this error is shown. Any idea?
THANK YOU !!!
I can see three letters drives in your answer. Probably that's the issue. When you upload a file, it's going to a folder in the app, and after it is uploaded as embeddings, it's deleted. I don't know why this "duplication" is needed.
This is what is shown in the console:
2023-05-13 18:19:16.063 Uncaught app exception
Traceback (most recent call last):
File "M:\Working- ENVS\Python3.10B\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "N:- GoogleDrive USAL\Working\PYTHON\quiver-main\main.py", line 57, in
D:\TEMP has no problem with permissions, it's the temporary directory of the system, all programs and users have permission.
Hey, was asked to help someone trying to use your project who were getting the same error. Below is the reply I gave them, which includes the likely cause.
https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L14 https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L20 https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/utils.py#L4
Looks like they create a temp file, then pass its file name to a function that tries to open it.
https://docs.python.org/3.9/library/tempfile.html#tempfile.NamedTemporaryFile
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows)
(and I knew what to look for thanks to https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file)
Hey, was asked to help someone trying to use your project who were getting the same error. Below is the reply I gave them, which includes the likely cause.
https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L14
https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L20
https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/utils.py#L4
Looks like they create a temp file, then pass its file name to a function that tries to open it.
https://docs.python.org/3.9/library/tempfile.html#tempfile.NamedTemporaryFile
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows)
(and I knew what to look for thanks to https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file)
That looks exactly like the problem I have. Any idea of how catch the error?
I have the same problem, on Windows as well.
This worked for me:
import os
import tempfile
import time
from utils import compute_sha1_from_file
from langchain.schema import Document
import streamlit as st
from langchain.text_splitter import RecursiveCharacterTextSplitter
def process_file(vector_store, file, loader_class, file_suffix):
documents = []
file_sha = ""
file_name = file.name
file_size = file.size
dateshort = time.strftime("%Y%m%d")
# Create a temporary file using mkstemp
fd, tmp_file_name = tempfile.mkstemp(suffix=file_suffix)
with os.fdopen(fd, 'wb') as tmp_file:
tmp_file.write(file.getvalue())
loader = loader_class(tmp_file_name)
documents = loader.load()
file_sha1 = compute_sha1_from_file(tmp_file_name)
chunk_size = st.session_state['chunk_size']
chunk_overlap = st.session_state['chunk_overlap']
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
documents = text_splitter.split_documents(documents)
# Add the document sha1 as metadata to each document
docs_with_metadata = [Document(page_content=doc.page_content, metadata={"file_sha1": file_sha1,"file_size":file_size ,"file_name": file_name, "chunk_size": chunk_size, "chunk_overlap": chunk_overlap, "date": dateshort}) for doc in documents]
vector_store.add_documents(docs_with_metadata)
# Don't forget to remove the temporary file when you're done with it
os.remove(tmp_file_name)
return
This version of common.py
should avoid the permission issue you were encountering on Windows.
I encountered a PermissionError when trying to open a temporary file on a Windows platform. The issue originates from this block of code in common.py:
with tempfile.NamedTemporaryFile(delete=True, suffix=file_suffix) as tmp_file:
tmp_file.write(file.getvalue())
tmp_file.flush()
loader = loader_class(tmp_file.name)
documents = loader.load()
file_sha1 = compute_sha1_from_file(tmp_file.name)
The PermissionError arises because tempfile.NamedTemporaryFile() opens a temporary file that cannot be opened again on Windows platforms while it's still open. This is due to the way Windows handles temporary files differently than Unix-based systems.
To resolve this issue, I modified the code to use tempfile.mkstemp() instead, which creates a temporary file in a more reliable manner across different platforms than tempfile.NamedTemporaryFile(). Importantly, it also ensures that the temporary file is closed before trying to open it again.
Here's the modified block of code:
# Create a temporary file using `tempfile.mkstemp`.
tmp_fd, tmp_file_name = tempfile.mkstemp(suffix=file_suffix)
try:
# Write to the temporary file.
with os.fdopen(tmp_fd, 'wb') as tmp_file:
tmp_file.write(file.getvalue())
tmp_file.flush()
# Now you can pass the temporary file's name to `loader_class` and `compute_sha1_from_file`.
loader = loader_class(tmp_file_name)
documents = loader.load()
file_sha1 = compute_sha1_from_file(tmp_file_name)
finally:
# Clean up the temporary file.
if os.path.exists(tmp_file_name):
os.remove(tmp_file_name)