cnn-text-classification-tf icon indicating copy to clipboard operation
cnn-text-classification-tf copied to clipboard

Try to adapt code to multi classification

Open mlankenau opened this issue 8 years ago • 5 comments

Hi,

I tried to adapt the code to another domain. We have about 300 classes we want to detect and (at least now) a really low number of about 800 samples.

I tried to adapt the code and started learning. But the accuracy is going to the south. I tried to change parameters of the net, but did not manage to improve it. Accuracy is starting at about 0.3 and going to zero.

Any Ideas?

BR

mlankenau avatar Jun 19 '17 19:06 mlankenau

Would it be possible to combine some of the classes to lower that number? It is best to have thousands of samples per class. With only 800 samples, there will only be about 2/3 samples per class if there are an equal number of samples per class. Even if the number of classes was reduced to 5 there would need to be many more samples to achieve good results. It would likely be better to look at other techniques such as using SVM for your task.

Starting with an accuracy of 0.3 doesn't sound correct. With random initialization, the network should have a 1/300=0.0033 chance of guessing correctly.

snsie avatar Jun 20 '17 21:06 snsie

Thanx for the explanation. That is really helpfull! We are currently creating the learning data by hand. So it was an early shot to try deep learning. I will try to detect only one class. Maybe using your approach is too much for what we try to accomplish. We have product names and try find the features/classes for them. So my idea is to have a dictionary of the words that are in the product names and use these as single input neurons. So I would ignore where location in the sentence.

mlankenau avatar Jun 21 '17 06:06 mlankenau

Can someone help me adapt the code to classify 7 labels? I need help changing the code since I'm really new to this.

johnp2266 avatar Jul 10 '18 18:07 johnp2266

@ShinValor Have you solved with 7 classes? I am trying to adapt the code to classify 4 labels.

Sunny-NEU avatar Mar 16 '19 08:03 Sunny-NEU

Here is our code where we modified it for multilabel. https://github.com/jannenev/cnn-news-classification-tf

in file data_helpers.py there is a function load_data_and_labels(), that loads data with binary-labels.

We added own function to load data with multi labels - in this case 5 labels. def load_newsdata_and_labels(): In this function you read in your own data. The labels need to be turned into one-hot form, ie if total 5 labels (0-4) and items label is 2, it would be: 0 0 1 0 0 (the bit 2 is 1 and rest 0) return [x_text, y]

So in the data_helpers.py function you load your own data, turn it to one-hot and return as return [x_text, y]

Then in file train.py (in our code version train2.py) replace call to original data loading function with the own version.

` # Load data # was original # print("Loading data...") # x_text, y = data_helpers.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)

### Load own data ################ print("Loading data...") x_text, y = data_helpers.load_newsdata_and_labels() `

jannenev avatar Mar 17 '19 09:03 jannenev