dga_classifier icon indicating copy to clipboard operation
dga_classifier copied to clipboard

forgot to ignore the last column 'private_tld' in 'features_norm.txt'

Open h2coder opened this issue 10 years ago • 4 comments

the error following ./run.sh

Traceback (most recent call last):
  File "feat_vectorizer.py", line 20, in <module>
    if not f=='tld': feat=float(feat)
ValueError: could not convert string to float:

Fix #20 of feat_vectorizer.py:

    for f in feature_header:$
        if f in ['','ip','class','private_tld']: continue$

h2coder avatar Jul 11 '15 16:07 h2coder

oh, thanks for pointing it out. fixed it.

phunterlau avatar Jul 16 '15 21:07 phunterlau

feat_vectorizer.py line 56 feature_list = vec.fit_transform(measurements).toarray() File "C:\Python27\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 226, in fit_transform return self._transform(X, fitting=True) File "C:\Python27\lib\site-packages\sklearn\feature_extraction\dict_vectorizer.py", line 167, in _transform indices.append(vocab[f]) MemoryError l am the new hand in learning. can you tell me why or how to solve it.

l am a student learning in Chinese University. can you contact me by QQ:1026754977 ,thanks . the problem has problem me a long time . thanks

chenlihuang avatar Sep 28 '16 07:09 chenlihuang

@phunterlau

chenlihuang avatar Sep 28 '16 07:09 chenlihuang

@chenlihuang seems like you have too little memory for carrying the vectorized features for your dataset. Please try dimension reduction for long features like TLD features, or just remove TLD features: this feature has some contributions but not very much, can be icy on the cake when you have large memory.

phunterlau avatar Sep 28 '16 16:09 phunterlau