Albert Zeyer
Albert Zeyer
There are a couple of layers in `segmental_model.py` which are probably not used by anyone, almost always obsolete because of more generic base layers, and probably broken in some cases...
I think this layer is not used by anyone, and obsolete because of the more generic `GenericAttentionLayer` (or just `DotLayer`). Remove?
I just want to raise the issue that the equality is not really well defined in all cases. We implement `__eq__` and related functions. Which are somewhat restrictive currently, but...
We already automatically infer it during template construction (subnet or rec subnet) via recursive `get_layer` calls in `transform_config_dict`. We can do the same logic all the time. That simplifies the...
> > > Note that it's really important to never mask away all keys for a single query, even if the query itself is inside padding and might be masked...
When `axes` includes some dynamic axis with sequence length. Instead of using `tf.reduce_mean`, we could share some code with `ReduceLayer` (which can also calculate the mean, but correctly respects padded...
When `axis` refers to a dynamic axis with sequence lengths. It should mask unused (padded) frames beforehand.
Fixes #556.