Update ingest.py
chunk_size 500 requires too much memory, even 32GB can't fit in it, change to 200, which works fine on 16GB macbook m1
A better approach would be to set the chunk size via an argument flag or in the .env file, and default to a reasonable number when omitted. If users have the available RAM they should be able to use it and not be limited to a lower number.
chunk_size 500 requires too much memory, even 32GB can't fit in it, change to 200, which works fine on 16GB macbook m1
As you ingest more data you will find that halving the chunk size very nearly doubles the RAM footprint during ingestion and when the db is made persistent, it nearly doubles again the RAM footprint as it saves. Some data points, ingesting 1900 books averaging 1MB in size has a running RAM footprint of about 15MB at 1000/100 chunk size. At 500/50 chunk size, the same upload of books is above 30MB and will choke a 64GB machine when persisted to disk. Decreasing the chunk size, if it works at all, is a bad idea for more than trivial amounts of data.