Aritra Roy Gosthipaty

Results 105 comments of Aritra Roy Gosthipaty

@Bhavay-2001 you would also need to update `norm_num_groups` parameter while changing the `block_out_channels`. I am looking at something like this: ```py unet = UNet1DModel( block_out_channels=(8, 8, 16), norm_num_groups=8, extra_in_channels=16, sample_size=8,...

Thanks for the ticket @elbaro :smile:

I am tagging @KyleGoyette here for a quicker solution. Thanks for the issue.

> Hi @a8nova, here's a [port of the Gemma modeling code](https://gist.github.com/Rocketknight1/77003f78147a9485a0f619e7202bb030)! Let me know if you need anything else. > > I did this port with Claude 3, and if...

@Rocketknight1 at this stage if I run the following ```python from transformers.models.mistral import TFMistralModel model = TFMistralModel.from_pretrained("mistralai/Mistral-7B-v0.1", from_pt=True) ``` I get the following stack trace. ``` Traceback (most recent call...

I was unable to test the port with the 7B model and reached out to @Rocketknight1 for a fix. He mentioned it would be wise to reduce the model size...

Running the following code: ```python import os os.environ["CUDA_VISIBLE_DEVICES"] = "-1" from transformers.models.mistral import TFMistralForCausalLM model = TFMistralForCausalLM.from_pretrained("ariG23498/tiny-random-mistral-for-causal-lm") ``` Results in the follwoing naming error: ```shell Some weights of the PyTorch...

@Rocketknight1 I was able to port all the weights from PyTorch to TensorFlow following your advice. I wanted your opinion on a small modification that I have made. ```diff class...

Hi @Rocketknight1 currently I think the `TFMistralForCausalLM` needs some review. I am unable to get this part aligned with the PyTorch implementation. Could you highlight something that is very evident...