Terry Chen
Terry Chen
We tested 512x768 for img2img model, it works well. For high res input like 1024x1024, we suggest to build the Unet with batch size 1 and update the pipeline accordingly.
for unet you can split the batch size 2, then run inference twice with batch 1, it should figure out OOM issue. btw the xformer based attn will come soon.
You can directly use call these sampler from pytorch: https://github.com/facebookincubator/AITemplate/blob/main/examples/05_stable_diffusion/pipeline_stable_diffusion_ait.py#L74
@longcw Thank you for sharing code. I have tested the converted darknet model, which got ~72 mAP. Then I trained VOC07 trainval set for 160 epoch (totally use your github...
We don't have plan for support T4/A10 gpus, but will support H100.
what kind of system do you use? windows or linux? This fcn branch only support windows.
Please do not use "tools/train_net.cpp" and "tools/test_net.cpp", cuz "caffe.cpp" contain train and test function. Also, "fcn.cpp" should be work for this FCN branch. I didn't check "convert_mnist_data.cpp" for this branch,...
how did you set cu_length for seqlen [512, 1024, 4k]?
We are still working on it, will be ready in the future release. If you don't want to wait, you can check out cutlass int8 examples https://github.com/NVIDIA/cutlass/blob/7c04f954151f606e60608061e891785fba229ae2/test/unit/gemm/device/gemm_s8t_s8n_f16t_tensor_op_s32_sm80.cu and add it...
Are you sure PT used padding during inference? i think for size 512/768 padding is not needed.