huggingtweets icon indicating copy to clipboard operation
huggingtweets copied to clipboard

Spanish version

Open pablo14 opened this issue 4 years ago • 4 comments

Hi Boris! Really enjoyed your google colab "app", thanks for sharing it!

Recently GPT-2 in Spanish was released, so I tried to play around but I'm getting errors:

Not a big deal, from your code

...

global trainer

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("mrm8488/spanish-gpt2")

model = AutoModelWithLMHead.from_pretrained("mrm8488/spanish-gpt2")

#tokenizer = AutoTokenizer.from_pretrained('gpt2')
#model = AutoModelForCausalLM.from_pretrained('gpt2', cache_dir=pathlib.Path('cache').resolve())

...

But I get:

An error occured...

num_samples should be a positive integer value, but got num_samples=0

If you can point me in the right direction, I can try more modifications :) Thanks in advance!

pablo14 avatar Jul 19 '21 00:07 pablo14

In addition, how do I delete from W&B and HF the tests I did? Didn't know they were upload. Thanks again!

pablo14 avatar Jul 19 '21 00:07 pablo14

For the error you also need to change: block_size = 512

The loaded value otherwise is wrong (not sure why).

There's still an error that I don't understand:

CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-8da941310b70> in finetune()
    481                     args=training_args,
    482                     data_collator=data_collator,
--> 483                     train_dataset=train_dataset)
    484 
    485                 # Update lr scheduler

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in __init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers)
    365 
    366         if self.place_model_on_device:
--> 367             self._move_model_to_device(model, args.device)
    368 
    369         # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in _move_model_to_device(self, model, device)
    507 
    508     def _move_model_to_device(self, model, device):
--> 509         model = model.to(device)
    510         # Moving a model to an XLA device disconnects the tied weights, so we have to retie them.
    511         if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, "tie_weights"):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
    850             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    851 
--> 852         return self._apply(convert)
    853 
    854     def register_backward_hook(

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    528     def _apply(self, fn):
    529         for module in self.children():
--> 530             module._apply(fn)
    531 
    532         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    528     def _apply(self, fn):
    529         for module in self.children():
--> 530             module._apply(fn)
    531 
    532         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    550                 # `with torch.no_grad():`
    551                 with torch.no_grad():
--> 552                     param_applied = fn(param)
    553                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    554                 if should_use_set_data:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in convert(t)
    848                 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
    849                             non_blocking, memory_format=convert_to_format)
--> 850             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    851 
    852         return self._apply(convert)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Do you know if the model works by itself (without finetuning)?

borisdayma avatar Aug 24 '21 03:08 borisdayma

I've seen other requests for more languages and I'd love to support them.

It would be cool if someone can:

  • test a model
  • post the error (if it's different than above)
  • shows that their custom model should work by itself (load tokenizer/model and generate samples)

Also kindly pinging @mrm8488 as his model is referenced for the Spanish version. I've seen I may need to update block_size (small samples) but not sure if anything else is needed for generation or fine-tuning (vs English gpt-2). As a note training leverages completely HF run_language_modeling script. Here is how I train: dev notebook.

borisdayma avatar Dec 03 '21 15:12 borisdayma

isn't block size inferred from the tokenizer ?

tcapelle avatar Jan 07 '22 22:01 tcapelle