keras-nlp
keras-nlp copied to clipboard
Convert our checkpoint colabs into runnable scripts
The colabs we currently have in tools/checkpoint_conversion are useful in that we don't loose the code for converting checkpoints. But they are fairly unwieldy. They must be pointed to a specific branch used for the model development, they are a ton of lines of code, and we need one for each model variant.
Instead we could try to write one script per model that handles checkpoint conversion (perhaps with a flag to control the model variant?). Potential file structure.
tools
└── checkpoint_conversion
├── README.md
├── convert_bert_weights.py
├── convert_gpt2_weights.py
└── requirements.txt
This will make it much easier to re-run and test checkpoint conversion code in the future.
I will take it
@vulkomilev, please go ahead with writing the conversion script for BERT! You can follow the same template as RoBERTa's script: https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_roberta_checkpoints.py.
okay
На сб, 3.12.2022 г. в 5:43 ч. Abheesht @.***> написа:
@vulkomilev https://github.com/vulkomilev, please go ahead with writing the conversion script for BERT! You can follow the same template as RoBERTa's script: https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_roberta_checkpoints.py .
— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras-nlp/issues/486#issuecomment-1336035269, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATA3WEM2BLPV5NLLIEGCEV3WLK6X5ANCNFSM6AAAAAASE4WQIU . You are receiving this because you were mentioned.Message ID: @.***>
@vulkomilev, please go ahead with writing the conversion script for BERT! You can follow the same template as RoBERTa's script: https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/convert_roberta_checkpoints.py.
where I can find BertBase(keras_nlp.models.BertBase)?
@vulkomilev, KerasNLP does not have a separate class for BertBase. There is a model class for BertBackbone: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/bert/bert_backbone.py#L35. If you want the base variant of BERT, you can do this:
# w/o loading the weights
bert_base = keras_nlp.models.BertBackbone.from_preset("bert_base_uncased_en", load_weights=False)
# loading the model with the pretrained weights
bert_base = keras_nlp.models.BertBackbone.from_preset("bert_base_uncased_en", load_weights=True)
These "presets" are drawn from here: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/bert/bert_presets.py.
Regarding checkpoint conversion for BERT, follow the same format as RoBERTa. Use the conversion notebooks mentioned in this directory as reference: https://github.com/keras-team/keras-nlp/tree/master/tools/checkpoint_conversion.
So, for example, contents of this cell
# Model garden BERT paths.
zip_path = f"""https://storage.googleapis.com/tf_model_garden/nlp/bert/v3/{TOKEN_TYPE}_L-12_H-768_A-12.tar.gz"""
zip_file = keras.utils.get_file(
f"""/content/{MODEL_NAME}""",
zip_path,
extract=True,
archive_format="tar",
)
can go in the download_model() function.
Contents of this cell:
model.get_layer("token_embedding").embeddings.assign(
weights["encoder/layer_with_weights-0/embeddings/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer("position_embedding").position_embeddings.assign(
weights["encoder/layer_with_weights-1/embeddings/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer("segment_embedding").embeddings.assign(
weights["encoder/layer_with_weights-2/embeddings/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer("embeddings_layer_norm").gamma.assign(
weights["encoder/layer_with_weights-3/gamma/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer("embeddings_layer_norm").beta.assign(
weights["encoder/layer_with_weights-3/beta/.ATTRIBUTES/VARIABLE_VALUE"]
)
for i in range(model.num_layers):
model.get_layer(f"transformer_layer_{i}")._self_attention_layer._key_dense.kernel.assign(
weights[f"encoder/layer_with_weights-{i + 4}/_attention_layer/_key_dense/kernel/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer(f"transformer_layer_{i}")._self_attention_layer._key_dense.bias.assign(
weights[f"encoder/layer_with_weights-{i + 4}/_attention_layer/_key_dense/bias/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer(f"transformer_layer_{i}")._self_attention_layer._query_dense.kernel.assign(
weights[f"encoder/layer_with_weights-{i + 4}/_attention_layer/_query_dense/kernel/.ATTRIBUTES/VARIABLE_VALUE"]
)
model.get_layer(f"transformer_layer_{i}")._self_attention_layer._query_dense.bias.assign(
weights[f"encoder/layer_with_weights-{i + 4}/_attention_layer/_query_dense/bias/.ATTRIBUTES/VARIABLE_VALUE"]
)
...
can go in convert_checkpoints().
etc., etc.
The conversion script should work for all BERT presets (passed as an arg to the script).
Hey, @vulkomilev! Are you working on this?
yes this week I will provide code
На вт, 10.01.2023 г. в 13:54 ч. Abheesht @.***> написа:
Hey, @vulkomilev https://github.com/vulkomilev! Are you working on this?
— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras-nlp/issues/486#issuecomment-1377147766, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATA3WELQC4RQHPPZYESHC53WRVEXRANCNFSM6AAAAAASE4WQIU . You are receiving this because you were mentioned.Message ID: @.***>
Hey, @vulkomilev! Are you working on this? Hi I have uploaded the code at https://github.com/vulkomilev/keras-nlp/blob/master/tools/checkpoint_conversion/convert_bert.py but it needs more work .For example I cant find the correct keys in the 'weights' array and I was wondering if the giant "if" in convert_checkpoints can be optimazed
abheesht17 I forgot to tag you please check the above message @mattdangerw
Hey, @vulkomilev! Taking a look, will get back to you ASAP.
@vulkomilev, let's reason through this together! :)
First, there are two sources of BERT checkpoints - TF Model Garden and BERT official repository. Let's make a table of which preset is obtained from which source.
| Preset | Source | Notebook URL |
|---|---|---|
bert_tiny_en_uncased |
BERT Official Repo | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_tiny_uncased_en.ipynb |
bert_small_en_uncased |
BERT Official Repo | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_small_uncased_en.ipynb |
bert_medium_en_uncased |
BERT Official Repo | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_medium_uncased_en.ipynb |
bert_base_en_uncased |
TF Model Garden | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_base_uncased.ipynb |
bert_base_en |
TF Model Garden | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_base_cased.ipynb |
bert_base_zh |
TF Model Garden | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_base_zh.ipynb |
bert_base_multi |
TF Model Garden | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_base_multi_cased.ipynb |
bert_large_en_uncased |
BERT Official Repo | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_large_uncased_en.ipynb |
bert_large_en |
BERT Official Repo | https://github.com/keras-team/keras-nlp/blob/master/tools/checkpoint_conversion/bert_large_cased_en.ipynb |
Now, once we have the table ready, let's get to work. The expectation is that the conversion snippet (the if...elif...if blocks you are talking about) can be simplified based on source, i.e., ideally, it should be same for all presets derived from "BERT Official Repo". Likewise for "TF Model Garden". Let's test this theory!
Let's run diffchecker between bert_tiny_en_uncased and bert_small_en_uncased: https://www.diffchecker.com/yDWyvec0/. The files are identical, and both are derived from "BERT Official Repo".
Let's test any two presets derived from "TF Model Garden", i.e., difference between bert_base_en_uncased and bert_base_en: https://www.diffchecker.com/Mi4aI5Is/. Ah, shoot! There is a minor difference of two lines, but this isn't a major worry.
So, the conclusion is that you can have an outer if...else to decide between BERT Official Repo and TF Model Garden. Inside this outer if...else block, you can have smaller if...else blocks based on 1-2 lines of variations between conversion scripts (if these differences exist, of course). Hope this clears things up!
Okay thanks for the information I am on it
made a new commit .Can you please check it out @abheesht17
bump @abheesht17
@vulkomilev, could you please open a PR? That way, everyone can take a look and leave comments on your code. Thanks!
will do
На сб, 18.02.2023 г. в 18:24 ч. Abheesht @.***> написа:
@vulkomilev https://github.com/vulkomilev, could you please open a PR? Thanks!
— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras-nlp/issues/486#issuecomment-1435710596, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATA3WEIRJP7IYOOPLLIXHKTWYDZUVANCNFSM6AAAAAASE4WQIU . You are receiving this because you were mentioned.Message ID: @.***>
done
bump @abheesht17
Hey, @vulkomilev! I left a few comments on your PR a couple of days ago
oh I didn't notice sorry
hey @vulkomilev are you still working with this issue ?
yes I need to connect with the other members
На нд, 12.03.2023 г. в 5:31 ч. ADITYA DAS @.***> написа:
hey @vulkomilev https://github.com/vulkomilev are you still working with this issue ?
— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras-nlp/issues/486#issuecomment-1465082834, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATA3WEKM4BWVAUCL5OECLCTW3U7P5ANCNFSM6AAAAAASE4WQIU . You are receiving this because you were mentioned.Message ID: @.***>
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.