Albert Zeyer comments

Results 880 comments of


                                            Albert Zeyer

trafficstars

Queue scan: handle return value different from zero

> This logic is used in the case when a job is (actually) runnable: `STATE_UNKNOWN` in the engine means "This job is not known to the engine". If that is...

Queue scan: handle return value different from zero

> > then it means, `task_state()` is wrong, and we should fix that. > > how would you propose to fix that? Maybe introduce a new state `STATE_QUEUE_ERROR` or so,...

Queue scan: handle return value different from zero

> I've added `STATE_QUEUE_ERROR` just like in the rest of the engines. But you still have this wrong code there? ```python if qs == []: return STATE_UNKNOWN ``` Sorry, see...

RF scaled_dot_product_attention

We can simply use a simple fallback implementation when this is done for ONNX. I think we already have that in some other places? E.g. in `TorchBackend.full`: ```python if torch.onnx.is_in_onnx_export():...

RF scaled_dot_product_attention

I'm currently limited in time and not following the full argumentation. But if the existing API of `rf.dot_attention` does not really fit `scaled_dot_product_attention`, I don't think this is a problem:...

RF scaled_dot_product_attention

> The way causal self attention is currently implemented is somewhat problematic, as the spatial dimension for the key, value matrices depends on the spatial dimension of the query matrix...

RF scaled_dot_product_attention

> Another issue is that some tests, and some code in i6_experiments for analyzing attention weights relies on the pre-softmax energies variable being present, as those code snippets use the...

RF scaled_dot_product_attention

> Before this is executed, `axis` is the (shared) spatial dimension of both query and key/value. > But for the key and value matrices, this spatial dimension is then replaced...

RF scaled_dot_product_attention

> But what we actually want to do, is use the `is_causal` parameter of the torch `scaled_dot_product_attention`, which reduces the computational time by 2x because all those unnecessary computations that...

RF scaled_dot_product_attention

> The key matrix will have dimensions `[batch, ..., hist_dim, embed_dim]`. The `hist_dim` dimension has a size tensor with dimensions `[axis]`. But `axis` is not a dimension of the key...