Pytorch implementation
Hi , i have created a pytorch implementation for multi-label scenario. pls review and share your comment if this is an acceptable code implementation .
Thanks @yuvaramsingh94, we'll try to get to it ASAP, but just FYI we're short on cycles right now so it might take us some time.
You should subtract the max before applying softmax to avoid numerical instability:
https://stackoverflow.com/a/42606665
The authors did this in their implementation.
Thanks for this PR. I'm putting some time into modernizing this library lately, and I'd like to add pytorch support.
That said, there are many things missing from this implementation that makes me think that a rewrite would be easier than a back-and-forth via code review. (Mainly things at the API level, eg target dimension, initialization issues, etc etc.)
I have a local implementation that's a bit closer to what the keras/tf modules look like, but i'm holding off on merging it until we can hit feature parity. I'm actually a bit confused about what the pytorch idiomatic way to implement things like regularization and constraints would be - the documentation and forum posts are almost no help here, so any input would be appreciated.