dga_classifier forgot to ignore the last column 'private_tld' in 'features

the error following ./run.sh

Traceback (most recent call last):
  File "feat_vectorizer.py", line 20, in <module>
    if not f=='tld': feat=float(feat)
ValueError: could not convert string to float:

Fix #20 of feat_vectorizer.py:

    for f in feature_header:$
        if f in ['','ip','class','private_tld']: continue$

Jul 11 '15 16:07 h2coder

oh, thanks for pointing it out. fixed it.

Jul 16 '15 21:07 phunterlau

feat_vectorizer.py line 56 feature_list = vec.fit_transform(measurements).toarray() File "C:\Python27\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 226, in fit_transform return self._transform(X, fitting=True) File "C:\Python27\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 167, in _transform indices.append(vocab[f]) MemoryError l am the new hand in learning. can you tell me why or how to solve it.

l am a student learning in Chinese University. can you contact me by QQ:1026754977 ,thanks . the problem has problem me a long time . thanks

Sep 28 '16 07:09 chenlihuang

@phunterlau

Sep 28 '16 07:09 chenlihuang

@chenlihuang seems like you have too little memory for carrying the vectorized features for your dataset. Please try dimension reduction for long features like TLD features, or just remove TLD features: this feature has some contributions but not very much, can be icy on the cake when you have large memory.

Sep 28 '16 16:09 phunterlau

forgot to ignore the last column 'private_tld' in 'features_norm.txt'