tungdq212 issues

Results 5 issues of


                                            tungdq212

Bottleneck while training with multi gpus

While training with multi gpus, sometimes 1 gpu sometime use 100% its ult while others is waiting with 0%. ![image](https://user-images.githubusercontent.com/104067740/230300254-7eb2990e-a7b4-4383-92cf-8943eb08fc52.png) Here is my training command ``` #!/bin/bash CUDA_VISIBLE_DEVICES='0,1,2' \ torchrun...

Thank you for excllent work. How about TRT batch inference?

Thank you for excllent work. > Detection models now can be exported to TRT engine with batch size > 1 - **inference code doesn't support it yet**, though now they...

Processes stuck when press multiple generate times

Hi, When i build a basic streamlit app like you, click a button GENERATE to start generate images. But when click this button multiple times, a numbers of processes will...

Error when training with local precomputed features

## Bug when local training with LocalDataset Here is my config (without some personal paths), run for mosaicml's diffusion: ``` algorithms: low_precision_groupnorm: attribute: unet precision: amp_fp16 low_precision_layernorm: attribute: unet precision:...

Grad become NaN

When training on my local machine (3090 24Gb) with batch size 12, grad value become NaN after few steps But I don't meet this when training on Google Cloud A100...