transformers TF: Can't create sharded XGLM model

System Info

transformers version: 4.22.0.dev0
Platform: Linux-5.15.0-33-generic-x86_64-with-glibc2.35
Python version: 3.8.13
Huggingface_hub version: 0.9.0
PyTorch version (GPU?): 1.12.0+cu116 (True)
Tensorflow version (GPU?): 2.9.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.5.0 (gpu)
Jax version: 0.3.5
JaxLib version: 0.3.5
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Running this CLI command

CUDA_VISIBLE_DEVICES="" TOKENIZERS_PARALLELISM=false NVIDIA_TF32_OVERRIDE=0 transformers-cli pt-to-tf --model-name facebook/xglm-2.9B --new-weights --max-error 3e-3

Gets you the following exception (in the sharding code)

Traceback (most recent call last):
  File "/home/joao/hf/bin/transformers-cli", line 8, in <module>
    sys.exit(main())
  File "/home/joao/transformers/src/transformers/commands/transformers_cli.py", line 55, in main
    service.run()
  File "/home/joao/transformers/src/transformers/commands/pt_to_tf.py", line 309, in run
    tf_from_pt_model.save_pretrained(self._local_dir)
  File "/home/joao/transformers/src/transformers/modeling_tf_utils.py", line 2020, in save_pretrained
    param_dset = shard_file.create_dataset(
  File "/home/joao/hf/lib/python3.8/site-packages/h5py/_hl/group.py", line 161, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/joao/hf/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 156, in make_new_dset
    dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 84, in h5py.h5d.create
TypeError: expected bytes, str found

Expected behavior

Successful sharding :D

Aug 26 '22 16:08 gante

cc @ArthurZucker

Aug 26 '22 16:08 gante

Hey! Little update on this : the problem comes from the previously introduced "hack" :

    return tf.Variable(emb, trainable=False, name="model.embed_positions.weights")

This appears here. This hack can also be seen in BART .

In order to have as little breaking changes as possible, I think we can add the followiing :

if "model." in layer.name : # potentially all models that have the hack will have model. something" 
    param_dset = shard_file.create_dataset(
                            ".".join(layer.name.split(".")[1:]), layer.numpy().shape, dtype=layer.numpy().dtype
                        )

I think we have to keep the "." separation for coherence. Will see if I can open a PR on that soon

Sep 15 '22 19:09 ArthurZucker

transformers transformers copied to clipboard

TF: Can't create sharded XGLM model

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard