lda2vec
lda2vec copied to clipboard
IndexError: Error calculating span: Can't find end
Running on OX X 10.11.6 $ python --version Python 2.7.11 :: Anaconda custom (x86_64)
$ python preprocess.py
Traceback (most recent call last):
File "preprocess.py", line 47, in
Related to: https://github.com/cemoody/lda2vec/issues/38
IndexError Traceback (most recent call last)
/Users/davidlaxer/anaconda/lib/python2.7/site-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in tokenize(texts, max_length, skip, attr, merge, nlp, **kwargs) 76 for phrase in doc.noun_chunks: 77 # Only keep adjectives and nouns, e.g. "good ideas" ---> 78 while len(phrase) > 1 and phrase[0].dep_ not in bad_deps: 79 phrase = phrase[1:] 80 if len(phrase) > 1:
/Users/davidlaxer/anaconda/lib/python2.7/site-packages/spacy-1.7.3-py2.7-macosx-10.5-x86_64.egg/spacy/tokens/span.pyx in spacy.tokens.span.Span.len (spacy/tokens/span.cpp:3955)() 63 64 def len(self): ---> 65 self._recalculate_indices() 66 if self.end < self.start: 67 return 0
/Users/davidlaxer/anaconda/lib/python2.7/site-packages/spacy-1.7.3-py2.7-macosx-10.5-x86_64.egg/spacy/tokens/span.pyx in spacy.tokens.span.Span._recalculate_indices (spacy/tokens/span.cpp:5105)() 128 end = token_by_end(self.doc.c, self.doc.length, self.end_char) 129 if end == -1: --> 130 raise IndexError("Error calculating span: Can't find end") 131 132 self.start = start
IndexError: Error calculating span: Can't find end
Seems to work with merge=False:
tokens, vocab = preprocess.tokenize(texts, max_length, n_threads=4, merge=False)
preprocess.py: line 46
I've run into similar issues (or the same issue) where merge=False resolves things, but what impact does that have on the results besides squashing the error?
The merge option seems to merge nouns with other words into single tokens. I don't really think that it affects the shape of topics too much as LDA should be able to handle words by themselves anyway.
I got the same issue. It could be solved by setting the "merge" option to "False".
tokens, vocab = preprocess.tokenize(texts, max_length, n_threads=4,
merge=False) ##!!!!change here into False
Hi I am just trying by giving 'merge=False'! May I know how much time will it take to run the 'tokenize' function?
Cheers Arav
Hi all
After I changed the 'merge = false', it is giving me the following error,
OverflowErrorTraceback (most recent call last)
/usr/local/lib/python2.7/dist-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in tokenize(texts, max_length, skip, attr, merge, nlp, **kwargs) 104 data[row, :length] = dat[:length, 0].ravel() 105 uniques = np.unique(data) --> 106 vocab = {v: nlp.vocab[v].lower_ for v in uniques if v != skip} 107 vocab[skip] = '<SKIP>' 108 return data, vocab
/usr/local/lib/python2.7/dist-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in
vocab.pyx in spacy.vocab.Vocab.getitem()
OverflowError: can't convert negative value to uint64_t
any heads up on this? kindly help me out with this.
cheers Arav
You need to run python x64 and libs also on x64.
Hi all
After I changed the 'merge = false', it is giving me the following error,
OverflowErrorTraceback (most recent call last) in () 45 texts = features.pop('comment_text').values 46 tokens, vocab = preprocess.tokenize(texts, max_length, n_threads=4, ---> 47 merge=False) 48 del texts 49
/usr/local/lib/python2.7/dist-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in tokenize(texts, max_length, skip, attr, merge, nlp, **kwargs) 104 data[row, :length] = dat[:length, 0].ravel() 105 uniques = np.unique(data) --> 106 vocab = {v: nlp.vocab[v].lower_ for v in uniques if v != skip} 107 vocab[skip] = '' 108 return data, vocab
/usr/local/lib/python2.7/dist-packages/lda2vec-0.1-py2.7.egg/lda2vec/preprocess.pyc in ((v,)) 104 data[row, :length] = dat[:length, 0].ravel() 105 uniques = np.unique(data) --> 106 vocab = {v: nlp.vocab[v].lower_ for v in uniques if v != skip} 107 vocab[skip] = '' 108 return data, vocab
vocab.pyx in spacy.vocab.Vocab.getitem()
OverflowError: can't convert negative value to uint64_t
any heads up on this? kindly help me out with this.
cheers Arav
i'm getting this error too when i try to run preprocess.py , how to fix this ??