Thilina Rajapakse

Results 57 comments of Thilina Rajapakse

Should be pretty similar to adding custom losses. You can freeze all the layers by setting `requires_grad = False` for all of them in your subclassed model. You can add...

If it doesn't work, you can always decouple BERT and the CNN and just feed the BERT outputs to the CNN. I'm no expert myself, but you seem to be...

Great! I use the Apex version with C++ extensions. The pure python version is lacking a few features. I don't see any reason not to use the C++ version.

Odd. I never had issues with any Ubuntu based distros. Welcome to Pytorch!

I don't think I changed batchnorm. Doesn't it get set when you change the opt level? I used opt 1. Opt 2 was giving me NaN losses.

Yeah, I just kept the defaults there.

You don't need to run `utils.py`. The `readme` tells you which notebooks to run.

No problem. The stuff in `utils` is used in the next notebook.

Those changes should be sufficient to enable multi-gpu training in my experience. Is there any other difference (e.g. batch size) between the two runs?

This is probably a silly question, but did you try this multiple times and receive the same results?