lopho
lopho
I will start working on this again within the next week, unless there is already someone working on it that I am not aware of.
Just some proposals, if you decide to change the name, better sooner than later, before it's been too long in the wild. `vit-g-14-1280` (embed dim, vs ViT-g-14 with 1024, might...
Made two PRs, pick and choose :heart: bigG #251 GG #252
I've done all I could come up with to make the random initialization deterministic. See https://github.com/mlfoundations/open_clip/blob/bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b/tests/util_test.py#L11-L18 It's reseeding for each test. My best guess is, that the machines that are...
An alternative would be to use either pretrained weights, or store the randomly initialized models used to generate testing data as well, this would explode the data size and require...
https://github.com/pytorch/pytorch/issues/15359 It seems determinisim accross different CPU families/arches/torch releases is impossible with torch. I'm suspecting that the tests only pass due to the random models output being mostly flat and...
Another alternative 1. random initializing 2. getting the state_dict 3. setting all values using a custom deterministic initialization function / or static values (ex. 0.5) 4. reload the state_dict I'll...
Static initialization does not fix the problem. It helps to amplify it, and confirms my initial suspicion that torch is not behaving deterministic across processor families. (ex. AVX vs AVX512)...
It works and is not terribly slow, but the workflow file is getting a bit complicated now. The commit to test against would have to be manually set with each...
see #260