cmc-csci181
cmc-csci181 copied to clipboard
Zipped File
I am having an issue running BERT when trying to use the new dataset. Looking into it, I see that gzip refers to a zipped file; however, it is showing up as an unzipped file. I think my file is unzipped, but in any case, what should I do.
Here is an image of my terminal.

The filename should be corona.multilang100.jsonl.gz, but your filename is corona.multilang100.jsonl. So it looks like you're right that you've unziped the file. Running
$ gzip corona.multilang100.jsonl
will rezip the file. Or if that doesn't work, you can redownload the file.