allennlp
allennlp copied to clipboard
Accept compressed files as input to `predict` when using a `Predictor`
Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict
command opens the file and reads lines for the Predictor
. This fails when it tries to load data from my compressed files.
https://github.com/allenai/allennlp/blob/39d7e5ae06551fe371d3e16f4d93162e55ec5dcc/allennlp/commands/predict.py#L208-L218
Describe the solution you'd like
Either automatically detect the file is compressed or add a flag to predict
that indicates that the file is compressed. One method that I have used to detect if a file is gzipped is here, although it isn't 100% accurate. I have an implementation here. Otherwise a flag like --compression-type
to mark how the file is compressed should be sufficient. Passing the type of compression would allow support for gzip, bz2, or any other method.
I think this feature is a great idea! The latter design (passing a flag) seems better to me.
I am adding @epwalsh here to get his input as well.
Yeup, this seems reasonable. I think we should try to automatically detect the compression type, but also have the flag so that users can override it when the automatic detection fails.
You may find this helper function useful: https://github.com/allenai/allennlp/blob/7a5106d541006a5b7f544284aeedb03ba2b480d5/allennlp/common/file_utils.py#L1087
Hi, I'd like to try working on this. I'm relatively a noobie so are there any pointers I should keep in mind before raising a pull request?
Hi @Dbhasin1, check out https://github.com/allenai/allennlp/blob/main/CONTRIBUTING.md#making-a-pull-request
Hi @epwalsh! is this issue still not resolved? I'm looking for issues to start contributing to AllenNLP, can I take this up if not resolved already?
Hi @spranjal25, we haven't heard from @Dbhasin1 for a while on their PR, so it's probably okay for you take over at this point.
hey, sorry I'd been engaged elsewhere for a while. I'd like to give it one more shot!
is the issue still open ?
Hi @danieldeutsch , @epwalsh , This issue seems to be a good initiation of my journey towards contribution to FOSS projects. Can you please assign this to me?
Hi @Akshat977, feel free to open a PR when you're ready