paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Adopting `fsspec` to enable building indexes directly in cloud buckets

Open jamesbraza opened this issue 1 year ago • 0 comments

It would be useful to avoid locally storing huge folders of PDFs/HTML/text files, instead reading them directly from the cloud. This would remove the need for people to spend 30+ mins downloading files from the cloud, and also allocating local disk to house the files while building the index.

The only tradeoff of building from a cloud-hosted paper_directory is additional network I/O at build time, but I think this is negligible compared to 30+ min downloading times.

There is a library fsspec that makes this easy, basically moving file I/O to be compatible with cloud mounts.

jamesbraza avatar Oct 17 '24 00:10 jamesbraza