Ross Wightman
Ross Wightman
no open_clip_config.json was pushed by whoever uploaded this model, so the hf-hub method won't work as it sourced the model config from the hub instead of open_clip...
So, been thinking about this one, I really don't like the is_training, it's not done this way elsewhere. The label shift is standard, but why do we need to truncate...
Yeah I don't like embed_cls either. Truncating the text input first, outside of the forward ala `self.encode_text(text[:, :-1])` is the 'normal' approach, but wasn't sure if that would impact the...
merged through #877 with minor changes
@gpucce do you have any idea what might be causing it? what's the symptom and by how much is it 'off'? there are numerical changes across versions of pytorch, etc...
@gpucce have you run same random inputs through the different towers, save results to verify closeness within some float eps on same env but with current main and previous release?...
Also, not sure if this is a factor, HF generate functionality might have changed slightly over transformers versions in a way that impacted how it was being used here... On...
FWIW using your cat.jpg, I get `'a cat sitting on its hind legs looking up . '` for both PT 2.1 w/ transformer 4.34 and latest main branch AND same...
@gpucce I'd avoid using the singleton tokenizer by calling the open_clip.tokenize(), and us factory to get one for your model. But yeah, the coca configs say context length is 76...
@kkjh0723 I think it might break with gradient checkpointing? not sure there is a workaround, possibly maybe using non reentrant mode?