paper-qa
paper-qa copied to clipboard
Adopting `fsspec` to enable building indexes directly in cloud buckets
It would be useful to avoid locally storing huge folders of PDFs/HTML/text files, instead reading them directly from the cloud. This would remove the need for people to spend 30+ mins downloading files from the cloud, and also allocating local disk to house the files while building the index.
The only tradeoff of building from a cloud-hosted paper_directory is additional network I/O at build time, but I think this is negligible compared to 30+ min downloading times.
There is a library fsspec that makes this easy, basically moving file I/O to be compatible with cloud mounts.