Sedigheh (Sarah) Eslami

Results 3 issues of Sedigheh (Sarah) Eslami

The paper mentions that the text encoder is a Transformer with the architecture modifications from GPT-2. My question is: is the text encoder trained from scratch or is it initialized...

Using the following snippet gives "none" values for some CLIP parameters such as `positional_embedding`, ``` for name, p in self.model.named_parameters(): print(p.grad) ``` where ``` myclip, _ = clip.load(args.clip_vision_encoder, jit=False) checkpoint...

Thanks for publishing the code-base. I noticed that in different runs, I get different results. It seems to be because of pytorch itself and setting `torch.backends.cudnn.benchmark = False` gives deterministic...