Sultan

Results 13 comments of Sultan

> Sorry for the (very) slow reply, this is actually the first time someone pointed me at this issue! > > This command should set the hyperparameters from the original...

I have pre-trained T5 and BART and it totally depend on the corpora and masking portion you are using. A larger corpora means the loss function need more time to...

Hi @gobbleturk , https://github.com/google/maxtext/pull/581 does not work with Gemma because Gemma 2 has local and global attention. I think each of q k and v attention layer has a local...

@gobbleturk any update on this? Adding this feature would support further research on the Gemma model and enhance it in academic research for limited-resource languages.

Can you please show an example how this code can work with HF text dataset (not the multimodal dataset) without Idefics2 processor? I mean using tokenizer.apply_chat_template ? how right and...

> @salrowili it should be similar to Idefics with the only difference that instead of `processor.tokenizer` you have simply `tokenizer`. The main thing to note is that Trainer needs inputs...

Hi @zucchini-nlp . When i state that the prediction is slow i compare it to this script here https://huggingface.co/docs/trl/en/sft_trainer, which is much faster. I think one possible way to solve...

Hi @Gopi-Uppari , Thank you for taking care of this. Yes, i have used Keras to load my custom load using "keras_hub.models.Llama3CausalLM.from_preset("./local_folder") and it work pretty well. However, the main...

Hi @Gopi-Uppari , The issue has not been resolved because we still need to have a script that convert Keras checkpoint to HF format for Llama. The solution you have...