amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

[Bug Report] RuntimeError when running instruction fine-tuning on mistral 7b, Sagemaker Jumpstart

Open louishourcade opened this issue 9 months ago • 2 comments

Link to the notebook https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/mistral-7b-instruction-domain-adaptation-finetuning.ipynb

Describe the bug I get an error when I run the training step for instruction fine-tuning in this notebook. The training job starts properly, but after ~10min it fails and raises: ErrorMessage "raise RuntimeError( RuntimeError: Could not find response key [1, 32002] in token IDs tensor([ 1, 20811, 349, ..., 302, 15637, 266])

To reproduce

  • Upload the notebook in a Sagemaker Notebook
  • Run every cell, the error appears when running the instruction-fine tuning training job (1.3 Starting Training section)

Logs Attaching some screenshots of the logs

Screenshot 2024-05-03 at 16 45 43

Screenshot 2024-05-03 at 16 47 10

Any idea on how to fix this ?

louishourcade avatar May 03 '24 14:05 louishourcade

@louishourcade: Facing same issue while running the example notebook from AWS. Did you find the solution?

prakash5801 avatar May 23 '24 03:05 prakash5801

Hi @prakash5801, no I didn't find time to investigate more. But I saw yesterday that the error is still there

louishourcade avatar Jun 13 '24 16:06 louishourcade