fairseq
fairseq copied to clipboard
Can't load NLLB MoE model with torch.load
🐛 Bug
I'm trying to open and investigate NLLB MoE model (405GB), but can't load it into torch. Smaller dense models seem to load fine, can access the checkpoint's parameters etc.
To Reproduce
- Run cmd '....'
>> python3
>> import torch
>> checkpoint = torch.load("nllb200moe54bmodel", map_location=torch.device('cpu'))
- See error
File "/data/user/model_info.py", line 18, in main
checkpoint = torch.load(args.model, map_location=torch.device('cpu'))
File "/home/user/.conda/envs/nllb/lib/python3.9/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/user/.conda/envs/nllb/lib/python3.9/site-packages/torch/serialization.py", line 762, in _legacy_load
return legacy_load(f)
File "/home/user/.conda/envs/nllb/lib/python3.9/site-packages/torch/serialization.py", line 687, in legacy_load
tar.extract('storages', path=tmpdir)
File "/home/user/.conda/envs/nllb/lib/python3.9/tarfile.py", line 2077, in extract
tarinfo = self.getmember(member)
File "/home/user/.conda/envs/nllb/lib/python3.9/tarfile.py", line 1799, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename 'storages' not found"
Code sample
import torch
checkpoint = torch.load("nllb200moe54bmodel", map_location=torch.device('cpu'))
Expected behavior
It should load with no error and parameters should be accessible.
Environment
- fairseq Version: nllb branch
- PyTorch Version: '1.10.1+cu113'
- OS (e.g., Linux): Ubuntu 22.04.1 LTS
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source):
git clone https://github.com/facebookresearch/fairseq.git
cd fairseq
git checkout nllb
pip install -e .
python setup.py build_ext --inplace
- Python version: 3.9.13
- CUDA/cuDNN version: 11.6
- GPU models and configuration: CPU only just to load the model into memory
- Any other relevant information: