NielsRogge

Results 388 comments of NielsRogge

Hi, CANINE doesn't support causal attention. It can only be used as an encoder.

You can leverage the decoder of [ByT5](https://huggingface.co/docs/transformers/model_doc/byt5), which is a byte-based model.

Hi @khadiravana-belagavi this is because T5/ByT5 is an encoder-decoder model. You would only need the decoder to combine it with a vision encoder. The vision encoder-decoder framework doesn't work out-of-the-box...

@khadiravana-belagavi BERT can be adapted to be used as decoder (by simply using a causal attention mask rather than a bidirectional one). CANINE on the other hand cannot simply be...

Hi, Std scaling wasn't supported until #21020 was merged (only mean scaling is currently supported on the latest PyPi install). So if you install Transformers from source, you can use...

I'll ping @novice03 here as he's an expert on Nyströmformer

Yes, feel free to contribute :)

Hi, Thanks for converting BLIP2 to HF :) I actually forked the LAVIS repo and made some tweaks to facilitate conversion (I removed a bunch of unnecessary requirements etc). See...

Thanks for reporting, that should not be the case! I extensively tested the greedy/beam search outputs on original vs my implementation to make sure everything works as expected. But the...

Also I'm not sure you can run both LAVIS and Transformers main branch in the same environment to compare, cause LAVIS relies on an older version of Transformers