kernel-memory icon indicating copy to clipboard operation
kernel-memory copied to clipboard

Difficulty with ingesting large files

Open Gpadh opened this issue 7 months ago • 0 comments

I'm trying to ingest a 100+GB file of legal data into a Kernel Memory service. The data I would like to access are the "opinions" files from this link (https://com-courtlistener-storage.s3-us-west-2.amazonaws.com/list.html?prefix=bulk-data/). They are zipped .bz2 files.

To ingest the data, I use azcopy to get a file into a container. Then, I have a function which triggers on file ingestion in this container. The function unzips the .bz2 file and sends it to Kernel Memory for ingestion in the form of a stream. The zipped file is about 30GB, when I unzip it the size becomes 100+GB.

This is the error message I get when I try to ingest the files into Kernel Memory: image

The repository to repro the issue is here: https://github.com/Gpadh/KMFileIngestion/tree/master

Please let me know if I can provide any more details to help.

Gpadh avatar Dec 01 '23 23:12 Gpadh