cnn-text-classification-tf
cnn-text-classification-tf copied to clipboard
Support for multiclass, word embeddings, configuration file and new datasets
Hi,
I added following functionalities:
- multiclass classification
- pre-trained word embedding using word2vec and GloVe
- configuration file in yaml format
- new dataset 20newsgroup (loaded using sklearn.datasets)
- loading multiclass text based dataset from local directory
And also path to the movie rating dataset has been moved to the configuration file. Thanks.
Hi @cahya-wirawan Thank you so much for the functionality of multiclass classification you did. I still have issues when loading my own local data, after following I did:
1, saved text files with categories as subfolder names in the folder: /data/bbcdata and there are 5 folders with corresponding txt files in bbcdata folder: "business","entertainment","politics","sport","tech" 2, updated the config.yml file as following
line 16: default: localdata
line 52: container_path: "/data/bbcdata"
Did I missing something to run the ./train.py Could you help me about that? Thank you so much!
Aven
@cahya-wirawan Following is the error I get using local data for multi-class data: Could you help me about this? Thanks a lot!
Loading data...
Traceback (most recent call last):
File "./train.py", line 72, in <module>
random_state=cfg["datasets"][dataset_name]["random_state"])
File "/home/xxliu10/repo/cahya_cnn/cnn-text-classification-tf/data_helpers.py", line 93, in get_datasets_localdata
random_state=random_state)
File "/home/xxliu10/anaconda3/lib/python3.6/site-packages/sklearn/datasets/base.py", line 232, in load_files
data = [d.decode(encoding, decode_error) for d in data]
File "/home/xxliu10/anaconda3/lib/python3.6/site-packages/sklearn/datasets/base.py", line 232, in <listcomp>
data = [d.decode(encoding, decode_error) for d in data]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 257: invalid start byte
How much is the expected training time ? and how many steps are needed to get good accuracy results/???
Hi @cahya-wirawan Thank you so much for the functionality of multiclass classification you did. I still have issues when loading my own local data, after following I did:
1, saved text files with categories as subfolder names in the folder: /data/bbcdata and there are 5 folders with corresponding txt files in bbcdata folder: "business","entertainment","politics","sport","tech" 2, updated the config.yml file as following
line 16: default: localdata line 52: container_path: "/data/bbcdata"
Did I missing something to run the ./train.py Could you help me about that? Thank you so much!
Aven
@cahya-wirawan Following is the error I get using local data for multi-class data: Could you help me about this? Thanks a lot!
Loading data... Traceback (most recent call last): File "./train.py", line 72, in <module> random_state=cfg["datasets"][dataset_name]["random_state"]) File "/home/xxliu10/repo/cahya_cnn/cnn-text-classification-tf/data_helpers.py", line 93, in get_datasets_localdata random_state=random_state) File "/home/xxliu10/anaconda3/lib/python3.6/site-packages/sklearn/datasets/base.py", line 232, in load_files data = [d.decode(encoding, decode_error) for d in data] File "/home/xxliu10/anaconda3/lib/python3.6/site-packages/sklearn/datasets/base.py", line 232, in <listcomp> data = [d.decode(encoding, decode_error) for d in data] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 257: invalid start byte
Hi are you able to fix this issue? I am facing the same issue
Hi @cahya-wirawan Thank you so much for the functionality of multiclass classification you did. I still have issues when loading my own local data, after following I did: 1, saved text files with categories as subfolder names in the folder: /data/bbcdata and there are 5 folders with corresponding txt files in bbcdata folder: "business","entertainment","politics","sport","tech" 2, updated the config.yml file as following
line 16: default: localdata line 52: container_path: "/data/bbcdata"
Did I missing something to run the ./train.py Could you help me about that? Thank you so much! Aven @cahya-wirawan Following is the error I get using local data for multi-class data: Could you help me about this? Thanks a lot!
Loading data... Traceback (most recent call last): File "./train.py", line 72, in <module> random_state=cfg["datasets"][dataset_name]["random_state"]) File "/home/xxliu10/repo/cahya_cnn/cnn-text-classification-tf/data_helpers.py", line 93, in get_datasets_localdata random_state=random_state) File "/home/xxliu10/anaconda3/lib/python3.6/site-packages/sklearn/datasets/base.py", line 232, in load_files data = [d.decode(encoding, decode_error) for d in data] File "/home/xxliu10/anaconda3/lib/python3.6/site-packages/sklearn/datasets/base.py", line 232, in <listcomp> data = [d.decode(encoding, decode_error) for d in data] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 257: invalid start byte
Hi are you able to fix this issue? I am facing the same issue
Anybody can give the solution of this problem?