course-nlp
course-nlp copied to clipboard
Lesson 10 notebooks: `bunzip` throws an error when unzipping `.bz2` files
On a Windows 10 64-bit machine:
bunzip
throws "EOFError: Compressed file ended before the end-of-stream marker was reached" when processing these files:
viwiki-latest-pages-articles.xml.bz2I
trwiki-latest-pages-articles.xml.bz2
Attaching a screenshot:
Windows version of 7-zip
throws a similar error
Note 1: A valid .xml
format file is still saved.
Note 2: The problem was resolved when I downloaded the files directly from https://archive.org/details/wikipediadumps
somehow same error :(