magpie icon indicating copy to clipboard operation
magpie copied to clipboard

Can we train Magpie models with just one word?

Open kaundinya5 opened this issue 6 years ago • 3 comments

I'm trying to use Magpie to classify different alphanumeric numbers into either policy number, account number. I changed the minimum number of words in word2vec to 1 to support this. After I trained the model, it always produces the same probability. Am I doing something wrong or do Magpie models always be trained with sentences and paragraphs?

kaundinya5 avatar Sep 26 '18 06:09 kaundinya5

For binary classification you can use simpler models search for “binary classification Scikit-learn”. Can you give an example of your sample input?

On Tue, Sep 25, 2018 at 11:17 PM kaundinya5 [email protected] wrote:

I'm trying to use Magpie to classify different alphanumeric numbers into either policy number, account number. I changed the minimum number of words in word2vec to 1 to support this. After I trained the model, it always produces the same probability. Am I doing something wrong or do Magpie models always be trained with sentences and paragraphs?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/inspirehep/magpie/issues/158, or mute the thread https://github.com/notifications/unsubscribe-auth/AkdXUpiN1WnVtRe3EEEaaVeRG2sl5MXXks5uexwTgaJpZM4W5-GC .

--

Edan Krolewicz

Edan Krolewicz

*Research Automation, *DiscoverOrg

P: +1 360.783.6842 |

[email protected]

dorg-ekrolewicz avatar Sep 26 '18 06:09 dorg-ekrolewicz

Thank you for your response, I'm not trying to implement binary classification, I just started out with two labels, will be adding more labels once I get this working, I just wanted to try and see if Magpie could work on just two labels first.

Here are some example inputs: Policy numbers: 594034, 02499357-3, 04187428-7, 04202703-0, 5572732, 06080677-0, 6498924502, 100014713, 100023672 Account numbers: AAAMOTO-01, ALDEINC-03 ,ALLAFOR-01, AMSMARK-01, AMERMOD-03

I have created the appropriate txt and lab files as well.

kaundinya5 avatar Sep 26 '18 06:09 kaundinya5

@kaudinya5 Magpie is designed to classify free text documents that consist of many words. It's power relies on the fact that it can learn implicit relationships between those words and use them for classification. If your input is a unique alphanumeric character, it's unlikely the model will learn much.

jstypka avatar Sep 26 '18 07:09 jstypka