AITemplate icon indicating copy to clipboard operation
AITemplate copied to clipboard

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Results 169 AITemplate issues
Sort by recently updated
recently updated
newest added

Summary: Should check whether key present in dict, not whether dict is empty. Reviewed By: muchulee8 Differential Revision: D45759517

CLA Signed
fb-exported

Summary: as titled The removal details are in D45632164 Reviewed By: jiaqizhai Differential Revision: D45644413 Privacy Context Container: L1138451

CLA Signed
fb-exported

Hi there, I tried to benchmark the performance of `nn.Linear` in AI Template on MI250 GPU and compared with rocBLAS. I expected AI Template should achieve a much higher throughput,...

Summary: Now we can set LowerPrecision=BF16 during ads publish pipeline. However, this setting won't change the packaged sample_input's dtype, thus AIT lower pipeline would hit this error during lowering: ```...

CLA Signed
fb-exported

Summary: 1. when enable bf16, `torch.ops.fbgemm.generic_histogram_binning_calibration_by_feature` in submod1 does not take bf16. So we need to cast its input to fp32 2. nan_to_num could handle bf16 now Differential Revision: D45421503

CLA Signed
fb-exported

ake: Leaving directory '/data/bml/tool/AITemplate/examples/05_stable_diffusion/tmp/profiler' 2023-04-24 12:55:25,551 INFO make stderr: /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs...

Summary: Currently, `fuse_split_linear_add` only supports cases when the split op kwarg uses a `slice`. This diff extends the fusion to support cases when the split op kwarg uses `int`s. The...

CLA Signed
fb-exported

Summary: If the upper bound of the `total_length` dimension is set to a larger value than B * N (N being the logical max. sequence length), this would not change...

CLA Signed
fb-exported

I realize that you probably require the make tool (https://github.com/facebookincubator/AITemplate/issues/83#issuecomment-1312794318) which is only available proper in WSL, but on AMD platforms we do not support WSL with ROCm on Windows,...

Differential Revision: D45186695

CLA Signed
fb-exported