magpie icon indicating copy to clipboard operation
magpie copied to clipboard

Predictions are horribly wrong

Open davidniki02 opened this issue 6 years ago • 4 comments

I have trained magpie on a news dataset. I have 9 labels for my data.

I training the model and tested the following text using magpie.predict_from_text():

Más de 690 mil casos de inmigrantes esperan ser resueltos por tribunales de Inmigración WASHINGTON— La Administración Trump ha convertido las protecciones de menores en sinónimo de “lagunas legales” que el Congreso debe eliminar pero mientras tanto, sobre el terreno, tampoco ha mejorado el atasco de más de 692,000 casos pendientes en los tribunales de Inmigración, según expertos.

While I don't have ANY Spanish documents in my training samples, magpie returns a 90% chance that this text belongs to one of my labels! It even predicts similar results for 3 other categories, all of them irrelevant. I even tried to see if there are any words that are causing this, but could not find any.

What can be wrong here? I trained the data on 400-500 documents for each category, and set epochs to 30 as well as 50 (no change in results)

davidniki02 avatar Jun 28 '18 23:06 davidniki02

Well, if you didn't feed it any Spanish text before, the network will return random result. In order for the network to build representations for words (in any language) they need to appear in the training set at least N times (N=5 by default). Otherwise Magpie just has no idea what is being fed into it and might be triggered by random noise like "Washington" or "Trump" in your case.

The rule is - you should test/predict on the same type of data as you train.

jstypka avatar Jul 02 '18 19:07 jstypka

The thing that worries me is the high confidence - 95% in some cases. If it does not recognize the words, should it not at least be careful about its predictions?

davidniki02 avatar Jul 02 '18 20:07 davidniki02

I have the same issue, and have these poor results even if I use some part of the training corpus to test.

shashi-netra avatar Jul 29 '18 13:07 shashi-netra

https://github.com/inspirehep/magpie/issues/149

shashi-netra avatar Aug 02 '18 15:08 shashi-netra