chat_corpus icon indicating copy to clipboard operation
chat_corpus copied to clipboard

twitter big corpus cat error

Open shini-tm opened this issue 6 years ago • 1 comments

After catting the files with: cat twitter_en_big.txt.gz.part* > twitter_en_big.txt.gz when I open the resulting file I get the error: "The archive is either in unknown format or damaged"

shini-tm avatar Mar 05 '18 14:03 shini-tm

I tried reparing the file from winrar. I get "Corrupt header is found", "Unexpected end of archive", "No files repaired"

shini-tm avatar Mar 05 '18 14:03 shini-tm