langid.py
langid.py copied to clipboard
if wordn is set in tokenize.py, the max_order in DFfeatureselect.py is according to words or bytes?
if wordn 3-gram is set in tokenize.py, the unit of max_order in DFfeatureselect.py is word or byte?Because in some langs, one string takes up several bytes.