Albert Zeyer
Albert Zeyer
> This logic is used in the case when a job is (actually) runnable: `STATE_UNKNOWN` in the engine means "This job is not known to the engine". If that is...
> > then it means, `task_state()` is wrong, and we should fix that. > > how would you propose to fix that? Maybe introduce a new state `STATE_QUEUE_ERROR` or so,...
> I've added `STATE_QUEUE_ERROR` just like in the rest of the engines. But you still have this wrong code there? ```python if qs == []: return STATE_UNKNOWN ``` Sorry, see...
We can simply use a simple fallback implementation when this is done for ONNX. I think we already have that in some other places? E.g. in `TorchBackend.full`: ```python if torch.onnx.is_in_onnx_export():...
I'm currently limited in time and not following the full argumentation. But if the existing API of `rf.dot_attention` does not really fit `scaled_dot_product_attention`, I don't think this is a problem:...
> The way causal self attention is currently implemented is somewhat problematic, as the spatial dimension for the key, value matrices depends on the spatial dimension of the query matrix...
> Another issue is that some tests, and some code in i6_experiments for analyzing attention weights relies on the pre-softmax energies variable being present, as those code snippets use the...
> Before this is executed, `axis` is the (shared) spatial dimension of both query and key/value. > But for the key and value matrices, this spatial dimension is then replaced...
> But what we actually want to do, is use the `is_causal` parameter of the torch `scaled_dot_product_attention`, which reduces the computational time by 2x because all those unnecessary computations that...
> The key matrix will have dimensions `[batch, ..., hist_dim, embed_dim]`. The `hist_dim` dimension has a size tensor with dimensions `[axis]`. But `axis` is not a dimension of the key...