emnlp2017-bilstm-cnn-crf icon indicating copy to clipboard operation
emnlp2017-bilstm-cnn-crf copied to clipboard

eof Error

Open ankur220693 opened this issue 6 years ago • 4 comments

python3.4 train_pos_mai.py

Using TensorFlow backend. Generate new embeddings files for a dataset Read file: komninos_english_embeddings.gz Traceback (most recent call last): File "Train_POS.py", line 48, in pickleFile = perpareDataset(embeddingsPath, datasets) File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 42, in perpareDataset embeddings, word2Idx = readEmbeddings(embeddingsPath, datasets, frequencyThresholdUnknownTokens, reducePretrainedEmbeddings) File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 135, in readEmbeddings for line in embeddingsIn: File "/usr/lib64/python3.4/gzip.py", line 389, in read1 while self.extrasize <= 0 and self._read(): File "/usr/lib64/python3.4/gzip.py", line 449, in _read self._read_eof() File "/usr/lib64/python3.4/gzip.py", line 482, in _read_eof crc32, isize = struct.unpack("<II", self._read_exact(8)) File "/usr/lib64/python3.4/gzip.py", line 286, in _read_exact raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached

ankur220693 avatar Feb 13 '19 12:02 ankur220693

Appears like you have an incomplete file downloaded. Maybe removing and trying again solves it.

Also test it with Python 3.6, I sadly can't help with any old Python versions

nreimers avatar Feb 13 '19 12:02 nreimers

thanks

ankur220693 avatar Feb 13 '19 12:02 ankur220693

Unable to fetch this error?

python3.6 train_pos_mai.py Using TensorFlow backend. Generate new embeddings files for a dataset Read file: maiwiki-20180920-stub-articles.xml Traceback (most recent call last): File "train_pos_mai.py", line 48, in pickleFile = perpareDataset(embeddingsPath, datasets) File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 42, in perpareDataset embeddings, word2Idx = readEmbeddings(embeddingsPath, datasets, frequencyThresholdUnknownTokens, reducePretrainedEmbeddings) File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 156, in readEmbeddings vector = np.array([float(num) for num in split[1:]]) File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 156, in vector = np.array([float(num) for num in split[1:]]) ValueError: could not convert string to float: 'xmlns="http://www.mediawiki.org/xml/export-0.10/"'

ankur220693 avatar Feb 18 '19 13:02 ankur220693

What type of embeddings file do you use?

The system expects an input file which is similar to the GloVe representation of embeddings.

Each line a token, followed by e.g. 300 floats (space separated).

It appears like your embedding file is some form of XML file? If yes, you would need to convert it to a format like the GloVe embeddings

nreimers avatar Feb 18 '19 13:02 nreimers