R2R icon indicating copy to clipboard operation
R2R copied to clipboard

File ingestion gets stuck for a long time

Open viraptor opened this issue 1 year ago • 2 comments

Describe the bug

I'm using the following config:

{
	"app": {
		"max_file_size_in_mb": 100
	},
	"embedding": {
		"provider": "ollama",
		"base_model": "nomic-embed-text",
		"base_dimension": 768,
		"batch_size": 32
	},
	"completions": {
		"provider": "litellm",
		"model": "ollama/dolphin-llama3:8b-v2.9-q6_K"
	},
	"ingestion":{
		"excluded_parsers": [
			"gif", "jpeg", "jpg", "png", "svg", "mp3", "mp4"
		]
	},
	"vector_database": {
		"provider": "pgvector",
		"user": "r2r",
		"password": "r2r",
		"host": "127.0.0.1",
		"db_name": "r2r",
		"port": 5432,
		"vecs_collection": "r2rnomic"
	}
}

When I ran r2r ingest-files on the EC2 documentation, the app got stuck for a long time, but without doing any work (all CPUs idle, no ollama requests visible in the logs). After over 2 minutes of waiting, it processed the file in ~30 sec. (I saw a lot of ollama embedding requests coming through).

Using commit 2f6f18c66858b4cf15d29accd19d7ef8016e98d4

To Reproduce

r2r --config-path ... ingest-files ec2-ug.pdf

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: macos

viraptor avatar Jul 06 '24 04:07 viraptor