Albert Zeyer

Results 963 comments of Albert Zeyer

Note, for the TF layers backend, we had some partial support for this, but it was also quite problematic. It only was intended for the batch dim, i.e. a batch...

Note, FlashAttention has `flash_attn_varlen_qkvpacked_func` ([API](https://github.com/Dao-AILab/flash-attention/blob/641db759ab7168e472909bc9ff1eda4a329de34f/flash_attn/flash_attn_interface.py#L1178C5-L1178C37)), with args: ``` qkv: (total, 3, nheads, headdim), where total = total number of tokens in the batch. cu_seqlens: (batch_size + 1,), dtype torch.int32. The...

Also note, [FlexAttention](https://pytorch.org/blog/flexattention/) also seems to support this use case (check for "Document Masking"), and in a way that doesn't need recompilation when the seq lengths change (as they would...

I did some quick search for this error. But it's not really clear. E.g.: https://github.com/ipython/ipython/issues/14643 But this seems fixed/outdated, or only relevant for Python 3.9, and older IPython 8? Maybe...

Note, Gemini was helpful for a suggestion on a workaround: ```shell pip install nest_asyncio ``` And then in IPython: ```python import nest_asyncio nest_asyncio.apply() import sisyphus # Or import i6_experiments ```...

> So do we still need to take action here? I think it should be possible to import `sisyphus` within IPython (and maybe other places which would have similar conditions...

Btw, I think I saw some similar problems before, where the native RF helpers would always behave like `allow_broadcast_all_sources=True`, but the pure Python logic does not. I thought I filled...

Note, the reason g++ did not work here: Python was not found. One solution was `module load Python/3.12.3`, or probably also putting the right Python into the `$PATH`. But the...

One small problem might be the `rng` arg to `map_seq`, and how this should behave. I don't see a good way that this would be consistent to how it behaved...

Just to clarify: Which is the inner and which is the outer dataset? (Just edit your post.)