Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

Test: Add checkpoint conversion test code

To update on where we are at with the test: 1. as I posted in the channel we need to clone `GPT2` on the HF side to something like `GPTMeg`...

Test: Add checkpoint conversion test code

Update: @sIncerass cloned GPT2 to GPTMeg at https://github.com/huggingface/transformers/pull/14084 I will try now to integrate it into this test. (I committed my wip which needs a lot of clean up which...

Test: Add checkpoint conversion test code

Just to crosslink - here is a related work happening in pytorch core wrt layer_norm under fp16: https://github.com/pytorch/pytorch/pull/66920

Test: Add checkpoint conversion test code

At some point we ought to complete this line of work. Otherwise we can't release the 13B model. The main question is whether we actually need 99.999% exactness and require...

Test: Add checkpoint conversion test code

We may have to wait until the experimental dust settles and see what new features we end up having in the models we trained at BigScience. If all these models...

Test: Add checkpoint conversion test code

I don't think there is anything that needs to be done at this moment.

Implement the ML Flow experiment tracker

Here is the current setup: - GPU instances have no access to the Internet. - There is no crontab or an option to run a daemon on JZ. - Currently...

Implement the ML Flow experiment tracker

There is no internet on GPU instances, period. The only way to communication from those to the world is via the shared filesystem, which can then be picked up by...

Implement the ML Flow experiment tracker

> Maybe we can get them to open up a single port specifically for this purpose? It's very unlikely, but there is no harm in asking. I will ask.

Implement the ML Flow experiment tracker

> > Maybe we can get them to open up a single port specifically for this purpose? > > It's very unlikely, but there is no harm in asking. I...