Megatron-LM
Megatron-LM copied to clipboard
[BUG] ModuleNotFoundError: No module named 'megatron.training.tokenizer'; 'megatron.training' is not a package
Describe the bug A clear and concise description of what the bug is.
Stonge issue
/aml2/ds) root@A100:/aml2/Megatron-LM# from megatron.training.tokenizer import build_tokenizer from: can't read /var/mail/megatron.training.tokenizer (/aml2/ds) root@A100:/aml2/Megatron-LM# python tools/preprocess_data.py \
--input /aml2/traindata/oscar-1GB.jsonl \ --output-prefix /aml2/traindata\ --tokenizer-type Llama2Tokenizer \ --tokenizer-model /aml2/llama2/tokenizer.model \ --workers 16 \ --append-eod
[2024-04-02 08:03:42,280] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/aml2/Megatron-LM/tools/preprocess_data.py", line 23, in
To Reproduce Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.
Expected behavior A clear and concise description of what you expected to happen.
Stack trace/logs If applicable, add the stack trace or logs from the time of the error.
Environment (please complete the following information):
- Megatron-LM commit ID the latest
- PyTorch version 2.2.1
- CUDA version 12.1
Proposed fix If you have a proposal for how to fix the issue state it here or link to a PR.
Additional context Add any other context about the problem here.
Have you tried again with the most recent version of main
? There was a fix regarding this.
Marking as stale. No activity in 60 days.
I used the latest release version v0.7.0 and the error still happend.
Marking as stale. No activity in 60 days.