chat_corpus
chat_corpus copied to clipboard
twitter big corpus cat error
After catting the files with: cat twitter_en_big.txt.gz.part* > twitter_en_big.txt.gz when I open the resulting file I get the error: "The archive is either in unknown format or damaged"
I tried reparing the file from winrar. I get "Corrupt header is found", "Unexpected end of archive", "No files repaired"