DeepSpeed
DeepSpeed copied to clipboard
[BUG]
export CUDA_VISIBLE_DEVICES=2,3
task=medqa_usmle_hf datadir=data/$task outdir=runs/$task/GPT2 mkdir -p $outdir seed=42
deepspeed --num_gpus 2 --num_nodes 1 run_multiple_choice.py --tokenizer_name stanford-crfm/pubmed_gpt_tokenizer --model_name_or_path "stanford-crfm/BioMedLM"
--train_file ../../../SLMReason/data/MedQA/BertMC/train.json --validation_file ../../../SLMReason/data/MedQA/BertMC/validation.json
--test_file ../../../SLMReason/data/MedQA/BertMC/test.json --do_train --do_eval --do_predict --per_device_train_batch_size 1
--per_device_eval_batch_size 1 --gradient_accumulation_steps 32
--learning_rate 2e-6 --warmup_ratio 0.5 --num_train_epochs 10 --max_seq_length 512 --seed $seed --data_seed $seed --logging_first_step --logging_steps 20
--save_strategy no --evaluation_strategy steps --eval_steps 500 --run_name debug
--output_dir trash/
--overwrite_output_dir
--deepspeed ds_config_zero3.json
--fp16
What is the error that you're seeing?
Trying to specify which CUDA devices to use?
FYI CUDA_VISIBLE_DEVICES does not work with the deepspeed launcher:
https://www.deepspeed.ai/getting-started/
This looks to be a duplicate of #3070, answered there as well.