torchdistill get an error

get an error

Open cxchen100 opened this issue 6 months ago • 3 comments

when i run the command , i get error like ,how to solve this question, thank you:

Traceback (most recent call last): File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 301, in main(argparser.parse_args()) File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 269, in main train(teacher_model, student_model, dataset_dict, is_regression, dst_ckpt_dir_path, metric, File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 160, in train train_one_epoch(training_box, epoch, log_freq) File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 119, in train_one_epoch loss = training_box.forward_process(sample_batch, targets=None, supp_dict=None) File "/data/llm/torchdistill/torchdistill/core/distillation.py", line 424, in forward_process total_loss = self.criterion(io_dict, model_loss_dict, targets) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/data/llm/torchdistill/torchdistill/losses/high_level.py", line 82, in forward loss_dict[loss_name] = factor * criterion(student_io_dict, teacher_io_dict, targets) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/data/llm/torchdistill/torchdistill/losses/mid_level.py", line 175, in forward student_logits = student_io_dict[self.student_module_path][self.student_module_io] KeyError: '.classifier' Traceback (most recent call last): File "/data/llm/miniconda3/envs/python_for_torchdistill/bin/accelerate", line 8, in sys.exit(main()) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/data/llm/miniconda3/envs/python_for_torchdistill/bin/python', 'examples/hf_transformers/text_classification.py', '--config', 'configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml', '--task', 'cola', '--run_log', 'log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt', '--private_output', 'leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/']' returned non-zero exit status 1.

my environment like this: accelerate 0.33.0 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 annotated-types 0.7.0 async-timeout 4.0.3 attrs 24.2.0 certifi 2024.7.4 charset-normalizer 3.3.2 Cython 3.0.11 datasets 2.21.0 deepspeed 0.15.0 dill 0.3.8 evaluate 0.4.2 filelock 3.15.4 frozenlist 1.4.1 fsspec 2024.6.1 hjson 3.1.0 huggingface-hub 0.24.6 idna 3.8 Jinja2 3.1.4 joblib 1.4.2 MarkupSafe 2.1.5 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.3 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.20 nvidia-nvtx-cu12 12.1.105 packaging 24.1 pandas 2.2.2 pillow 10.4.0 pip 24.2 protobuf 5.27.4 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pydantic 2.8.2 pydantic_core 2.20.1 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.2 regex 2024.7.24 requests 2.32.3 safetensors 0.4.4 scikit-learn 1.5.1 scipy 1.14.1 sentencepiece 0.2.0 setuptools 72.1.0 six 1.16.0 sympy 1.13.2 threadpoolctl 3.5.0 tokenizers 0.19.1 torch 2.4.0 torchvision 0.19.0 tqdm 4.66.5 transformers 4.44.2 triton 3.0.0 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 wheel 0.43.0 xxhash 3.5.0 yarl 1.9.4

Aug 29 '24 03:08 cxchen100

torchdistill torchdistill copied to clipboard

get an error

torchdistill
torchdistill copied to clipboard