transformers
transformers copied to clipboard
TF: Can't create sharded XGLM model
System Info
-
transformers
version: 4.22.0.dev0 - Platform: Linux-5.15.0-33-generic-x86_64-with-glibc2.35
- Python version: 3.8.13
- Huggingface_hub version: 0.9.0
- PyTorch version (GPU?): 1.12.0+cu116 (True)
- Tensorflow version (GPU?): 2.9.1 (True)
- Flax version (CPU?/GPU?/TPU?): 0.5.0 (gpu)
- Jax version: 0.3.5
- JaxLib version: 0.3.5
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
@ArthurZucker
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Running this CLI command
CUDA_VISIBLE_DEVICES="" TOKENIZERS_PARALLELISM=false NVIDIA_TF32_OVERRIDE=0 transformers-cli pt-to-tf --model-name facebook/xglm-2.9B --new-weights --max-error 3e-3
Gets you the following exception (in the sharding code)
Traceback (most recent call last):
File "/home/joao/hf/bin/transformers-cli", line 8, in <module>
sys.exit(main())
File "/home/joao/transformers/src/transformers/commands/transformers_cli.py", line 55, in main
service.run()
File "/home/joao/transformers/src/transformers/commands/pt_to_tf.py", line 309, in run
tf_from_pt_model.save_pretrained(self._local_dir)
File "/home/joao/transformers/src/transformers/modeling_tf_utils.py", line 2020, in save_pretrained
param_dset = shard_file.create_dataset(
File "/home/joao/hf/lib/python3.8/site-packages/h5py/_hl/group.py", line 161, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/home/joao/hf/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 156, in make_new_dset
dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 84, in h5py.h5d.create
TypeError: expected bytes, str found
Expected behavior
Successful sharding :D
cc @ArthurZucker
Hey! Little update on this : the problem comes from the previously introduced "hack" :
return tf.Variable(emb, trainable=False, name="model.embed_positions.weights")
This appears here. This hack can also be seen in BART .
In order to have as little breaking changes as possible, I think we can add the followiing :
if "model." in layer.name : # potentially all models that have the hack will have model. something"
param_dset = shard_file.create_dataset(
".".join(layer.name.split(".")[1:]), layer.numpy().shape, dtype=layer.numpy().dtype
)
I think we have to keep the "." separation for coherence. Will see if I can open a PR on that soon