robotcator
robotcator
Actually, I mean you need to check your torch version is cuda version not cpu version.
which Linux distribution are you using? Can you provide more details? Or you can also use the pip installation from the source by using the command. `pip install git+https://github.com/dptech-corp/Uni-Core.git`
Currently, we implemented the following case for attention bias/mask, ``` Support the shape of q/k/v as follow: q's shape [total_size * head, seq_q, head_dim] k's shape [total_size * head, seq_k,...
> Thanks so much for the great work, and congrats on the speedup on Uni-Fold! > > I'll have more time this weekend to review carefully. Great, any suggestions are...
> Not worked if mask or bias have odd sequence length. `CUDA error (/tmp/pip-req-build-k5fpgkes/csrc/flash_attn/src/fmha_fprop_fp16_kernel.sm80.cu:140): misaligned address` Thank you for your advice. Currently, `Adding the odd length of mask/bias in the...
> Another way to phrase this question: is the mask for each sequence always of the form [0, 0, ..., 0, -inf, -inf ...]? Or could they have the form...
> @robotcator I encounter gradient overflow when attn_mask is not None or attn_bias is not None. Could you give me some advice? Do you mean overflow or nan? And can...
> > > @robotcator I encounter gradient overflow when attn_mask is not None or attn_bias is not None. Could you give me some advice? > > > > > >...
> Hi, thanks everyone for bringing up this enhancement! Is this PR a way to support custom attention masks? Is this the best walkaround so far, given it is not...
> @robotcator I have a question about `attn_bias`, if my `attn_bias` is trainable, does flash attn will compute grad of `attn_bias` automatically ? I don't know whether it's too late...