Sultan comments

Results 13 comments of


                                            Sultan

BART Pretraining Script

> Sorry for the (very) slow reply, this is actually the first time someone pointed me at this issue! > > This command should set the hyperparameters from the original...

BART Pretraining Script

I have pre-trained T5 and BART and it totally depend on the corpora and masking portion you are using. A larger corpora means the loss function need more time to...

converting Gemma maxtext compatible checkpoint to Hugging Face format

Any update on this?

converting Gemma maxtext compatible checkpoint to Hugging Face format

Hi @gobbleturk , https://github.com/google/maxtext/pull/581 does not work with Gemma because Gemma 2 has local and global attention. I think each of q k and v attention layer has a local...

converting Gemma maxtext compatible checkpoint to Hugging Face format

@gobbleturk any update on this? Adding this feature would support further research on the Gemma model and enhance it in academic research for limited-resource languages.

Trainer: add predict with generate

Can you please show an example how this code can work with HF text dataset (not the multimodal dataset) without Idefics2 processor? I mean using tokenizer.apply_chat_template ? how right and...

Trainer: add predict with generate

> @salrowili it should be similar to Idefics with the only difference that instead of `processor.tokenizer` you have simply `tokenizer`. The main thing to note is that Trainer needs inputs...

Trainer: add predict with generate

Hi @zucchini-nlp . When i state that the prediction is slow i compare it to this script here https://huggingface.co/docs/trl/en/sft_trainer, which is much faster. I think one possible way to solve...

Exporting Keras Llama Checkpoint to HF

Hi @Gopi-Uppari , Thank you for taking care of this. Yes, i have used Keras to load my custom load using "keras_hub.models.Llama3CausalLM.from_preset("./local_folder") and it work pretty well. However, the main...

Exporting Keras Llama Checkpoint to HF

Hi @Gopi-Uppari , The issue has not been resolved because we still need to have a script that convert Keras checkpoint to HF format for Llama. The solution you have...