Trevor Morris
Trevor Morris
> I've ran initially SRRESNET with VGG54 for 10 power 5 iterations. How do the results of this training look? If I recall correctly, in the paper they train SRResNet...
Sorry for the delay. 1. Yes, that is a good way to change the scaling to 4x. 2. That looks correct. How does the output look? 3. That is strange,...
srgan.SRGanGenerator and srgan.SRGanDiscriminator can be created with num_upsamples=3 for an 8x factor. There will also be some other parts in the code that need to be adjusted from 4x to...
@PatriosTheGreat
Thanks for filing this issue @AlessioNetti, I was able to reproduce the bug. Taking a look now.
I think I found the issue. We should be able to get the fix in soon.
Hi @AlessioNetti, it looks like we intentionally changed this so that the logits length will match the tokens. We will consider whether it should be added back.
@ezhulenev Could you please take a look when you have a chance?
> Is it possible to test those? Thanks for reviewing, added test to `xla_client_test`.
I am also encountering this issue when using dynamic rope scaling and here is what's happening: 1. During `LlamaAttention.__init__()`, the `LlamaRotaryEmbedding` module is initialized. No `device` arg is provided: https://github.com/huggingface/transformers/blob/d6ba1ac041ac0b07bc589dd82a67cfb76f75d0f9/src/transformers/models/llama/modeling_llama.py#L304...