AITemplate icon indicating copy to clipboard operation
AITemplate copied to clipboard

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Results 166 AITemplate issues
Sort by recently updated
recently updated
newest added

We keep a wishlist of examples **may** appear in v0.2 or later release. Any contributions are welcomed. - [ ] Distributed Inference: OPT-175B - [ ] Wav2Vec - [ ]...

wishlist

Hi! There's a neat hack to create seamless images in stable diffusion - by replacing padding_mode in all conv2d to "circular", for vae and for unet: https://github.com/sd-webui/stable-diffusion-webui/discussions/224#discussioncomment-3602679 Is it possible...

I have a question over the computation of the latency in the cpp benchmark function. Below `accumulate` applies a `max` over each thread output: https://github.com/facebookincubator/AITemplate/blob/44026ba7e7f5376a80cf0f2b333a0f25c0eeda6c/static/csrc/model_container.cpp#L224 ```cpp auto max_time = std::accumulate(...

Thank for your project! When we want to deploy my model in c++ project, is there C++ API provided to deploy my model? We don't find any c++ api to...

### summary After loading the model with from_pretrained, memory usage increases as the pipeline is used repeatedly. Each execution consumes 50 MB to 100 MB of memory. Eventually, the process...

Hi! Have anyone worked or planned to integrate diffusion models from [CompVis/stable-diffusion/ldm](https://github.com/CompVis/stable-diffusion/tree/main/ldm/models/diffusion)? Would be of great help as of my experience that implementation is working quite good on characters category....

This change will upgrade the AMD ROCM compiler from a custom build of amd-stg-open branch to the official ROCM 5.3 release. Also, removed some python packages that are not needed...

CLA Signed
module: rocm

## description When we run ```python benchmark_ait.py --batch-size=1``` the first time, it will raise exceptions with message 'OSError: ./tmp/resnet50_1/test.so: cannot open shared object file: No such file or directory'. This...

The `README.md` says `NVIDIA: AIT is only tested on SM80+ GPUs (Ampere etc). Not all kernels work with old SM75/SM70 (T4/V100) GPUs.` Which I interpreted as it may work but...

See title, would be nice to pip install the package