Void Main
Void Main
@narendasan totally agree that we should produce a NoneTensor in the graph. Could you please point out how to create a NoneTensor? I'll change my code.
Got it, I'll try to make these changes later this week.
I've changed to compiler back to clang, and here's what I've got: ``` python setup.py build running build running build_py running build_ext building 'audiotools.cdio' extension clang -fno-strict-aliasing -fno-common -dynamic -g...
Thanks tuffy, I tried: `$ export C_INCLUDE_PATH=/usr/local/include && make install` But still got the same error. Something must be wrong with my cdio installation.
> bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the...
> Because we don't have good way to save the bfloat16 weight now Hi @byshiue , another quick question, since there are some numpy extensions to bfloat16 (such as [bfloat16](https://github.com/GreenWaves-Technologies/bfloat16)...
I'm just running the exact code sample from the tutorial: ``` import pytest import torch import triton import triton.language as tl @triton.jit def _fwd_kernel( Q, K, V, sm_scale, TMP, L,...
With triton installed with `pip install -U --pre triton`, the exact version is: `2.0.0.dev20221120`. @ptillet
> @void-main Are you synchronizing (torch.cuda.synchronize) when you measure the time? The measurement for FlashAttention CUDA looks like it barely changes when you increase the sequence length, that seems wrong....
@wpeebles @s9xie could you please kindly take a look, thank you very much!