Katherine Wu
Katherine Wu
Hi @Feynman27, could you give us more information about the pretrained model? Is there an assert defined in the `model` layer?
That might be it. Does the model behave as expected without the dist strat scope (are the losses and metrics as expected)?
The reason why this is failing is that the MHA layer has extra instructions when deserialized with `from_config`, which isn't called when initialized using `num_heads` and `key_dim`: https://github.com/keras-team/keras/blob/v2.9.0/keras/layers/attention/multi_head_attention.py#L303 Without this...
+@rchao This kind of issue is somewhat common, anyone who tries to create a subclassed MHA layer will run into it. The new idempotent saving format will also see it,...
@SirDavidLudwig In code snippet in my previous comment, you can either pass `embed_dim` and `num_heads`, or the mha layer into the constructor. The mha argument is needed only for `from_config()`
Tagging @rchao
The best way to avoid this issue is to disable the layer tracing when creating the SavedModel, but you'll have to manually define the `serving_default` function (this is the default...
@gcunhase Are you getting the same error even with `save_traces=False`?
@gcunhase can you paste the error trace?
Thanks for the PR! I think this counts as a shallow copy, since the values aren't being copied before being added to the new dict. You could replace the loop...