bert-sklearn icon indicating copy to clipboard operation
bert-sklearn copied to clipboard

Is there any plan to support multi-label classification task?

Open EvanMu96 opened this issue 5 years ago • 10 comments

According to the subject.

EvanMu96 avatar Oct 27 '19 05:10 EvanMu96

Hi. I was not thinking of adding a multi-label classification task. I can look into it though. Is there a particular open source NLP dataset you are thinking about?

charles9n avatar Oct 28 '19 01:10 charles9n

Hi. I was not thinking of adding a multi-label classification task. I can look into it though. Is there a particular open source NLP dataset you are thinking about?

Hi charles9n. Your repo is great so I want to use this repo on Kaggle's Toxic Comment Classification Challenge. I think maybe I can try to add a multi-label feature to this repo if I have time recently.

EvanMu96 avatar Oct 29 '19 02:10 EvanMu96

Thankyou. That sounds like a great idea that will be useful to others as well.

charles9n avatar Oct 29 '19 05:10 charles9n

Hello,

I am interested in this as well. Was wondering if there has been any progress? Or workarounds.

mividalocas avatar Dec 10 '19 19:12 mividalocas

I haven't heard anything. Let's see what EvanMu96 thinks...

There was a great medium blog post at the beginning of the year on this though: https://medium.com/huggingface/multi-label-text-classification-using-bert-the-mighty-transformer-69714fa3fb3d

It shouldn't be too big a change to add it here. Mainly it would be changing out the loss function I think.

But if you are needing it now, the author of the medium post went on to create a really nice repo with multi-label classification. You can check it out at: https://github.com/kaushaltrivedi/fast-bert

charles9n avatar Dec 10 '19 22:12 charles9n

Thank you for the references. Those certainly helps. I am experimenting with OnevsRest classifier as a workaround for now.

mividalocas avatar Dec 11 '19 14:12 mividalocas

This project is nice I want to see it have this feature! @charles9n @EvanMu96 @mividalocas . I'm attempting to fork it and add multi-label support. Got through configuration (model.multilabel = True) and the addition of a toxic comments test. But running into tensor shape issues and I'm not very good at torch.

See this explanation for changing the final activation layer to support multi-label: https://dejanbatanjac.github.io/2019/07/04/softmax-vs-sigmoid.html

You can start the toxic comments test with: python -m pytest -sv tests/test_bert_sklearn_multilabel.py ... but I'm stuck at Expected input batch_size (8) to match target batch_size (48)

Just pushed my fork: https://github.com/Shane-Neeley/bert-sklearn

Shane-Neeley avatar Mar 14 '20 16:03 Shane-Neeley

Hey @Shane-Neeley have you made any progress on this? If it's still broken I can look at your fork.

brandomr avatar Mar 25 '20 19:03 brandomr

Hi @brandomr .. got sidetracked (making a COVID-19 project like everyone else) .. yes if you can, please take a look. I think it's almost there. I've changed 7 files here https://github.com/Shane-Neeley/bert-sklearn/commit/0b8a3f642046991245b033501a40cc918a9118f2

And you can run the test with python -m pytest -sv tests/test_bert_sklearn_multilabel.py

Shane-Neeley avatar Mar 30 '20 19:03 Shane-Neeley

Any update on this? I'm interested in this as well.

bruffridge avatar Dec 07 '21 04:12 bruffridge