Abdul Fatir comments

Results 119 comments of


                                            Abdul Fatir

Add support for left padding and masking in forward() and generate()

@normster @tridao @albertfgu I believe this feature would be very nice to have in a stable release. Can we work towards merging this into main and have it in the...

Use observation mask as feature in DeepAR

No, it does not, so this only testing the padding aspect.

Query about the reproducibility of the Motion Capture dataset in "Scalable Gradients..." (Li et al., 2020)

@matteoguarrera Unfortunately, despite trying for several weeks, I wasn't able to reproduce a number anywhere close to what's reported in the paper. Finally, for these datasets, I just copied the...

KL_divergence subtraction z_dim

In the KL expression you have `-1` and you take a sum over the latent dimension. This would just sum to `-z_dim` which is what is written.

[`Add Mamba`] Adds support for the `Mamba` models

@ArthurZucker Thank you for this amazing addition. Are there any plans to add something equivalent to `attention_mask` for Mamba?

[`Add Mamba`] Adds support for the `Mamba` models

- For batched inference with inputs of different length. - For pretraining with different masking schemes than a causal mask.

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

@ArthurZucker for the T5 family of models, attention bias is required, so flash-attention won't work for now but torch SDPA can still use the memory efficient kernel from xformers, right?...

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

I can open a PR for T5 with SDPA then. Are there specific things that I should know of or a reference that can look at?

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

@sayakpaul sorry, I was on vacation. Will look into this now and maybe open a PR in a couple of days. I didn't know that there were diffusion models using...

Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures

@fxmarty @ArthurZucker @sayakpaul I have opened a PR #30375 for T5. I still have a couple of questions due to some tests failing. Let's discuss those on the PR.