Raushan Turganbay
Raushan Turganbay
Hey! I think you can get identical logits without double precision if you disable caching by "use_cache=False" 🤔
@lowlypalace I don't think there's anything else to make them identical. As @younesbelkada said, there will always be some small numerical precision errors. Disabling cache and recalculating keys/values every time...
Hey! CogVLM uses custom code from the hub when you set `trust_remote_code=True` and the model is not yet added to transformers. There is an [open PR here](https://github.com/huggingface/transformers/pull/28196) to port the...
The modeling /processingcode is done and passes all the tests with dummy weights. I looked in transformers for similar models to replace VQEncoder with a simple call to vision backbone,...
Ready for review! The model conversion is fixed, thanks to Arthur for spotting the bug. Now we have to convert and upload the weights to Meta org on hub, so...
1. Yes, maybe we don't need assertion then. A bit weird that outputs are completely different though, I will check out and change it. 2. Hmm, that's weird, I will...
The PR is ready. The only moment that needs to be done is uploading weights to the hub (after we find what's the issue with 30b model's image module with...
Hmm, probably we need to manually move tye residual to the same device as hidden states after attn module. Btw, I was running on one A100 gpu, it fits perfectly...
@EwoutH almost there, just need to apply changes for sharded inference in 30b model. I was off for a week and will work on it tomorrow.
Pushed changes for qk layernorm and tested that it works for both checkpoints. Locally tests are all passing, except for slow ones. So the last step is now to run...