forgot to ignore the last column 'private_tld' in 'features_norm.txt'
the error following ./run.sh
Traceback (most recent call last):
File "feat_vectorizer.py", line 20, in <module>
if not f=='tld': feat=float(feat)
ValueError: could not convert string to float:
Fix #20 of feat_vectorizer.py:
for f in feature_header:$
if f in ['','ip','class','private_tld']: continue$
oh, thanks for pointing it out. fixed it.
feat_vectorizer.py line 56 feature_list = vec.fit_transform(measurements).toarray() File "C:\Python27\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 226, in fit_transform return self._transform(X, fitting=True) File "C:\Python27\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 167, in _transform indices.append(vocab[f]) MemoryError l am the new hand in learning. can you tell me why or how to solve it.
l am a student learning in Chinese University. can you contact me by QQ:1026754977 ,thanks . the problem has problem me a long time . thanks
@phunterlau
@chenlihuang seems like you have too little memory for carrying the vectorized features for your dataset. Please try dimension reduction for long features like TLD features, or just remove TLD features: this feature has some contributions but not very much, can be icy on the cake when you have large memory.