Ying Zhang

Results 35 comments of Ying Zhang

For 1: output tensor is not a constant. It's not necessary to call SetConstant. For 2: Yes you'll need to convert data from fp32 to fp16 on your end.

Hi @kisisjrlly , what's your GPU arch? You'll need SM80 (or at least SM70) to run AITemplate. Thanks.

Sorry for the late rely, for bmm_rrr_permute op, the last dimension of input tensors need to be divisible by 2. We'll update the error message to make it more accurate....

Hi @asynclee, we don't have opt-66b in our examples, but feel free to try fx2ait or write your own AIT frontend for the opt-66b model.

Hi @mindbeast , If you have torch input / output tensors, you could use AITModel directly: https://github.com/facebookincubator/AITemplate/blob/main/fx2ait/fx2ait/csrc/AITModel.h. It takes the generated .so file path and other metadata to initialize an...

cc fx2ait owner @wushirong @frank-wei FYI we have plans to use aten2ait to support the StableDiffusion model by the end of this half, but you're very welcome to contribute the...

This is weird, more debugging is needed. Probably you could try to dump some gpu traces by using Nsight Systems.

Hi @isouf , the supported SD version is a bit old, please refer to https://github.com/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion to check the supported version.

Some AIT kernels (e.g. mem-efficient-attention kernel) may lack SM75 specifications. SM80+ is needed. (Also check https://github.com/facebookincubator/AITemplate#installation).