Stas Bekman
Stas Bekman
To update on where we are at with the test: 1. as I posted in the channel we need to clone `GPT2` on the HF side to something like `GPTMeg`...
Update: @sIncerass cloned GPT2 to GPTMeg at https://github.com/huggingface/transformers/pull/14084 I will try now to integrate it into this test. (I committed my wip which needs a lot of clean up which...
Just to crosslink - here is a related work happening in pytorch core wrt layer_norm under fp16: https://github.com/pytorch/pytorch/pull/66920
At some point we ought to complete this line of work. Otherwise we can't release the 13B model. The main question is whether we actually need 99.999% exactness and require...
We may have to wait until the experimental dust settles and see what new features we end up having in the models we trained at BigScience. If all these models...
I don't think there is anything that needs to be done at this moment.
Here is the current setup: - GPU instances have no access to the Internet. - There is no crontab or an option to run a daemon on JZ. - Currently...
There is no internet on GPU instances, period. The only way to communication from those to the world is via the shared filesystem, which can then be picked up by...
> Maybe we can get them to open up a single port specifically for this purpose? It's very unlikely, but there is no harm in asking. I will ask.
> > Maybe we can get them to open up a single port specifically for this purpose? > > It's very unlikely, but there is no harm in asking. I...