Spanish version
Hi Boris! Really enjoyed your google colab "app", thanks for sharing it!
Recently GPT-2 in Spanish was released, so I tried to play around but I'm getting errors:
Not a big deal, from your code
...
global trainer
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mrm8488/spanish-gpt2")
model = AutoModelWithLMHead.from_pretrained("mrm8488/spanish-gpt2")
#tokenizer = AutoTokenizer.from_pretrained('gpt2')
#model = AutoModelForCausalLM.from_pretrained('gpt2', cache_dir=pathlib.Path('cache').resolve())
...
But I get:
An error occured...
num_samples should be a positive integer value, but got num_samples=0
If you can point me in the right direction, I can try more modifications :) Thanks in advance!
In addition, how do I delete from W&B and HF the tests I did? Didn't know they were upload. Thanks again!
For the error you also need to change: block_size = 512
The loaded value otherwise is wrong (not sure why).
There's still an error that I don't understand:
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-11-8da941310b70> in finetune()
481 args=training_args,
482 data_collator=data_collator,
--> 483 train_dataset=train_dataset)
484
485 # Update lr scheduler
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in __init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers)
365
366 if self.place_model_on_device:
--> 367 self._move_model_to_device(model, args.device)
368
369 # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in _move_model_to_device(self, model, device)
507
508 def _move_model_to_device(self, model, device):
--> 509 model = model.to(device)
510 # Moving a model to an XLA device disconnects the tied weights, so we have to retie them.
511 if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, "tie_weights"):
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
850 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
851
--> 852 return self._apply(convert)
853
854 def register_backward_hook(
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
528 def _apply(self, fn):
529 for module in self.children():
--> 530 module._apply(fn)
531
532 def compute_should_use_set_data(tensor, tensor_applied):
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
528 def _apply(self, fn):
529 for module in self.children():
--> 530 module._apply(fn)
531
532 def compute_should_use_set_data(tensor, tensor_applied):
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
550 # `with torch.no_grad():`
551 with torch.no_grad():
--> 552 param_applied = fn(param)
553 should_use_set_data = compute_should_use_set_data(param, param_applied)
554 if should_use_set_data:
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in convert(t)
848 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
849 non_blocking, memory_format=convert_to_format)
--> 850 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
851
852 return self._apply(convert)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Do you know if the model works by itself (without finetuning)?
I've seen other requests for more languages and I'd love to support them.
It would be cool if someone can:
- test a model
- post the error (if it's different than above)
- shows that their custom model should work by itself (load tokenizer/model and generate samples)
Also kindly pinging @mrm8488 as his model is referenced for the Spanish version.
I've seen I may need to update block_size (small samples) but not sure if anything else is needed for generation or fine-tuning (vs English gpt-2).
As a note training leverages completely HF run_language_modeling script. Here is how I train: dev notebook.
isn't block size inferred from the tokenizer ?