UnboundLocalError: local variable 'state' referenced before assignment

when I run this demo , an error occurred

INFO:__main__:***** Running training *****
INFO:__main__:  Num examples = 117750
INFO:__main__:  Num Epochs = 8
INFO:__main__:  Batch size per device (w. accumulation) = 20
INFO:__main__:  Global train batch size (w. parallel & distributed) = 80
INFO:__main__:  Total optimization steps = 11768
Initial compilation. This might take some minutes...
Epoch ... :   0%|                                                                                                               | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/", line 1219, in <module>
  File "/home/", line 1085, in main
    state, train_metric = p_train_step(state, batch)
UnboundLocalError: local variable 'state' referenced before assignment


  • python3.9
  • cuda11.3


PYTHONPATH=/home/ python3 \
    --output_dir="/home/" \
    --model_name_or_path="facebook/opt-2.7b" \
    --dataset_name="wikitext" \
    --dataset_config_name="wikitext-2-raw-v1" \
    --do_train --do_eval \
    --block_size="1024" \
    --per_device_train_batch_size="20" \
    --per_device_eval_batch_size="20" \
    --num_micro_batches 4 \
    --operator_parallel 2 \
    --pipeline_parallel 1 \
    --dtype="float16" \
    --learning_rate="5e-4" --warmup_steps="2000" \
    --adam_beta1="0.9" --adam_beta2="0.98" --weight_decay="0.01" \
    --overwrite_output_dir \
    --num_train_epochs="8" \
    --logging_steps="16" \
    --save_steps="2500" \


What should I do? thanks.

when i use pipeline parallel, other error occurred:

2023-02-28 11:24:36,070 ERROR -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MeshHostWorker.load_opt_params_worker_func() (pid=12770, ip=10.xx.2.46, repr=<alpa.device_mesh.MeshHostWorker object at 0x7fe178157b20>)
  File "/home/", line 147, in load_opt_params_worker_func
  File "/home/", line 121, in load_array
    return np.load(os.path.join(path, key))
  File "/home/", line 405, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/data/nfs/'


PYTHONPATH=/home/ python3 \
    --output_dir="/home/" \
    --config_name="./config_30b.json" \
    --tokenizer_name="facebook/opt-30b" \
    --alpa_init \
    --use_manual_layer \
    --dataset_name="wikitext" \
    --dataset_config_name="wikitext-2-raw-v1" \
    --do_train \
    --block_size="1024" \
    --per_device_train_batch_size="1024" \
    --per_device_eval_batch_size="64" \
    --num_micro_batches 256 \
    --operator_parallel 1 \
    --pipeline_parallel 8 \
    --dtype="float16" \
    --learning_rate="5e-4" --warmup_steps="2000" \
    --adam_beta1="0.9" --adam_beta2="0.98" --weight_decay="0.01" \
    --overwrite_output_dir \
    --num_train_epochs="10" \
    --logging_steps="1" \
    --save_steps="888" \

When I check the configuration file, I find this file is missing?


Where should I download it?@zhisbug

best wishes!

