NielsRogge

Results 388 comments of NielsRogge

I have a PR here which aims to further verify equivalence: https://github.com/huggingface/transformers/pull/24854. The conversion script can be found [here](https://github.com/NielsRogge/transformers/blob/improve_blip2/src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py) and can be run as follows: ``` pip install -U git+https://github.com/nielsrogge/LAVIS.git@blip2_float32...

Thanks a lot for this fix, would you be able to take into account my comment such that we can merge it? 🙏 Thanks! Btw the same fix could then...

Hi, it seems the CI didn't run properly. Could you push an empty commit to trigger it?

Thanks for raising this issue! VideoMAE indeed uses the same mask ratio (number of masked patches) per video to make batching possible. See [this class](https://github.com/MCG-NJU/VideoMAE/blob/main/masking_generator.py) which the authors use to...

cc @sgugger would it be possible to add this feature to `push_to_hub` as well? I'd like to use it for BLIP-2. For the moment it seems the only way to...

Yes that's why the pipeline is called classification, rather than regression. We would need an `ImageRegressionPipeline` for this use case ;)

Closing this issue as it seems resolved.

I'd recommend fine-tuning CLIP if you want to do image-text retrieval using this script: https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text. Fine-tuning BLIP might be harder as it involves some very specific loss functions.