allennlp icon indicating copy to clipboard operation
allennlp copied to clipboard

Accept compressed files as input to `predict` when using a `Predictor`

Open danieldeutsch opened this issue 3 years ago • 10 comments

Is your feature request related to a problem? Please describe. I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

https://github.com/allenai/allennlp/blob/39d7e5ae06551fe371d3e16f4d93162e55ec5dcc/allennlp/commands/predict.py#L208-L218

Describe the solution you'd like Either automatically detect the file is compressed or add a flag to predict that indicates that the file is compressed. One method that I have used to detect if a file is gzipped is here, although it isn't 100% accurate. I have an implementation here. Otherwise a flag like --compression-type to mark how the file is compressed should be sufficient. Passing the type of compression would allow support for gzip, bz2, or any other method.

danieldeutsch avatar Jun 02 '21 17:06 danieldeutsch

I think this feature is a great idea! The latter design (passing a flag) seems better to me.

I am adding @epwalsh here to get his input as well.

ArjunSubramonian avatar Jun 03 '21 05:06 ArjunSubramonian

Yeup, this seems reasonable. I think we should try to automatically detect the compression type, but also have the flag so that users can override it when the automatic detection fails.

You may find this helper function useful: https://github.com/allenai/allennlp/blob/7a5106d541006a5b7f544284aeedb03ba2b480d5/allennlp/common/file_utils.py#L1087

epwalsh avatar Jun 03 '21 19:06 epwalsh

Hi, I'd like to try working on this. I'm relatively a noobie so are there any pointers I should keep in mind before raising a pull request?

Dbhasin1 avatar Jun 29 '21 10:06 Dbhasin1

Hi @Dbhasin1, check out https://github.com/allenai/allennlp/blob/main/CONTRIBUTING.md#making-a-pull-request

epwalsh avatar Jun 29 '21 16:06 epwalsh

Hi @epwalsh! is this issue still not resolved? I'm looking for issues to start contributing to AllenNLP, can I take this up if not resolved already?

spranjal25 avatar Oct 14 '21 07:10 spranjal25

Hi @spranjal25, we haven't heard from @Dbhasin1 for a while on their PR, so it's probably okay for you take over at this point.

epwalsh avatar Oct 25 '21 16:10 epwalsh

hey, sorry I'd been engaged elsewhere for a while. I'd like to give it one more shot!

Dbhasin1 avatar Oct 25 '21 17:10 Dbhasin1

is the issue still open ?

aterzgar avatar Jul 21 '22 18:07 aterzgar

Hi @danieldeutsch , @epwalsh , This issue seems to be a good initiation of my journey towards contribution to FOSS projects. Can you please assign this to me?

Akshat977 avatar Sep 10 '22 05:09 Akshat977

Hi @Akshat977, feel free to open a PR when you're ready

epwalsh avatar Sep 12 '22 16:09 epwalsh