Rahul Somani

Results 57 comments of Rahul Somani

@usuyama I was using `--local-loss` and `--gather-with-grad`, but wasn't aware of `grad-checkpointing`. That's pretty handy, I was able to increase batch size from `24` to `224` per GPU! Thanks @rom1504...

@OrangeSodahub the second link you provided seems to be loading the already exported `.onnx` model. Do you have a reference script that converts a PyTorch CLIP model -> ONNX model?...

Actually, I was able to solve the `torch.onnx.export` by adding a line in the text encoder to explicitly cast the tokenized text input to `torch.float32` just before doing the global...

@mitchellnw thanks for your response. I'm training on image-text pairs. Thanks for the idea re. interpolation, I'll definitely give that a shot and report back my findings.

The weight interpolation suggestion was super helpful. In the graph below, `alpha=0.0` is the pre-trained model and `alpha=1.0` is the fully finetuned model. Turns out an alpha of 0.4 goes...

I think this should be good to go now. @rohun-tripathi the updated code is a lot simpler, and should work functionally just as well. The LayerNorm freezing code I'd copied...

@rwightman I couldn't think of a more robust way to extract the LayerNorm layers inside the resblocks: https://github.com/mlfoundations/open_clip/blob/f065ee612d43ee654a35814b6c37ffdd89fac27c/src/open_clip/transformer.py#L830-L839 The assumption here is that the only norm layer being used is...

@rwightman thanks for looking into that. That's really great to hear re. s1/s2 as those, in my eyes, sit in the perfect sweetspot of speed + accuracy. Given your observations,...

@rwightman Apple just released timm and OpenCLIP checkpoints: https://huggingface.co/collections/apple/mobileclip-models-datacompdr-data-665789776e1aa2b59f35f7c8

Awesome. Excited to use these! Thanks for helping out with that.