NielsRogge

Results 388 comments of NielsRogge

Hi, You can use ResNet with the vision encoder-decoder framework, although it might not work out-of-the-box as shown by your first message (for the moment that requires forking the library...

I'll mark this request as a "good first issue" as I don't have the bandwidth for this atm. However for this to work we would need to maintain a mapping...

Thanks, I'm currently checking out your branch, will open a PR on your fork of things I'd like to see updated

Hi @raghavanone I went over your PR, looks great already, however there are still various things which need to be addressed, for which I opened a PR here: https://github.com/raghavanone/transformers/pull/1.

Gently pinging @amyeroberts for approving this PR

Hi @gaceladri the PR is actually totally ready, the only thing that needs to done is perhaps make [this function](https://github.com/NielsRogge/transformers/blob/5199d3d3a08264f1b17442504559c28304ce619c/src/transformers/models/h3/modeling_h3.py#L139) more like the other Attention classes in the library (like...

It's pretty hard for us to debug if there's no error message being given. :( Also, BLIP-2 should support all arguments of the `generate` method, and there's no need to...

Thanks for reporting, we are looking into why this is the case. cc @gante

Oh yes one reason why results weren't the same was because you might have used different generation settings. Note that if you do `model.generate(**inputs)`, greedy decoding is used by default...

Hi @rodrigo-barraza this is supported, just pass in `num_return_sequences` as argument to the `generate()` method.