Rohan
Rohan
Hey doesn't the original tf implementation have only four convolution layers and two fc layers? this one has 6, 3...why the difference? How could the embeddings be identical then?
Hello, nice work! I was just wondering why you convert every label
Where exactly is the equation (3) in the main paper implemented in the SAGA algorithm?
Hello, thanks for the great work! I was wondering what the reason must be behind using the self.LARGE_NUMBER. I understand that it serves to suppress the logits due to self...
Hi, Why are the multilevel attentions being used during encoding? They are used only during decoding according to the paper about Multimodal attention..
Hi, the dataset isn't available in the links you mentioned before in a different issue. Kindly guide..
Is there a chance that you may train a model with a larger context capacity like the Llama-2? Thanks!