R2R
R2R copied to clipboard
File ingestion gets stuck for a long time
Describe the bug
I'm using the following config:
{
"app": {
"max_file_size_in_mb": 100
},
"embedding": {
"provider": "ollama",
"base_model": "nomic-embed-text",
"base_dimension": 768,
"batch_size": 32
},
"completions": {
"provider": "litellm",
"model": "ollama/dolphin-llama3:8b-v2.9-q6_K"
},
"ingestion":{
"excluded_parsers": [
"gif", "jpeg", "jpg", "png", "svg", "mp3", "mp4"
]
},
"vector_database": {
"provider": "pgvector",
"user": "r2r",
"password": "r2r",
"host": "127.0.0.1",
"db_name": "r2r",
"port": 5432,
"vecs_collection": "r2rnomic"
}
}
When I ran r2r ingest-files on the EC2 documentation, the app got stuck for a long time, but without doing any work (all CPUs idle, no ollama requests visible in the logs). After over 2 minutes of waiting, it processed the file in ~30 sec. (I saw a lot of ollama embedding requests coming through).
Using commit 2f6f18c66858b4cf15d29accd19d7ef8016e98d4
To Reproduce
r2r --config-path ... ingest-files ec2-ug.pdf
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: macos