Alexander Visheratin
Alexander Visheratin
Hi. In the paper, you start training the model with the `ViT-B` model, and this is the smaller available model. Did you run experiments with smaller models, like [ViT-S](https://arxiv.org/pdf/2204.07141.pdf)? If...
Hi! I want to extend BLIP2 capabilities to another language. I have a pre-trained LLM (T5 family) and a dataset with image captions. Could you please help me understand my...
Hi, Merve! I noticed that you are comparing multilingual SigLIP with NLLB-CLIP. But there actually is a newer version of NLLB-CLIP that uses the SigLIP vision encoder! It is integrated...
Hello @mbrunel! Thank you very much for the library, having fully functional transformers in the browser is very helpful. I'm wondering how can I use not local WASM file but...
Need to implement a top-k sampler for Seq2Seq models and expose generation options in the API.