Void Main comments

Results 41 comments of


                                            Void Main

fix: skip None output instead of throw an error

@narendasan totally agree that we should produce a NoneTensor in the graph. Could you please point out how to create a NoneTensor? I'll change my code.

fix: skip None output instead of throw an error

Got it, I'll try to make these changes later this week.

Cannot install on OSX 10.8

I've changed to compiler back to clang, and here's what I've got: ``` python setup.py build running build running build_py running build_ext building 'audiotools.cdio' extension clang -fno-strict-aliasing -fno-common -dynamic -g...

Cannot install on OSX 10.8

Thanks tuffy, I tried: `$ export C_INCLUDE_PATH=/usr/local/include && make install` But still got the same error. Something must be wrong with my cdio installation.

Support model weights and calculations in bfloat16

> bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the...

Support model weights and calculations in bfloat16

> Because we don't have good way to save the bfloat16 weight now Hi @byshiue , another quick question, since there are some numpy extensions to bfloat16 (such as [bfloat16](https://github.com/GreenWaves-Technologies/bfloat16)...

Performance gap between triton and flash attn

I'm just running the exact code sample from the tutorial: ``` import pytest import torch import triton import triton.language as tl @triton.jit def _fwd_kernel( Q, K, V, sm_scale, TMP, L,...

Performance gap between triton and flash attn

With triton installed with `pip install -U --pre triton`, the exact version is: `2.0.0.dev20221120`. @ptillet

Performance gap between triton and flash attn

> @void-main Are you synchronizing (torch.cuda.synchronize) when you measure the time? The measurement for FlashAttention CUDA looks like it barely changes when you increase the sequence length, that seems wrong....

[Question] Why DiT-XL/2 takes 119 GFlops to generate 256x256 images?

@wpeebles @s9xie could you please kindly take a look, thank you very much!