Alex
Alex
@rthadur @ahmedsabie A similar thing happened for panoptic segmentation model conversion as well. Any idea where the SparseToDense operation could possibly come from? There is no sparse tensor in the...
Fix ChannelSelector training stage multi GPU idling: https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/commit/b88a486a6ffe45a7e00af6d394e5ed9999e7688d https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/commit/69d3f72832752c37f3600f4a1b9ee5d91ca4eb3f
Screenshot for the profiling visualization (note those two purple ones)  Percentage sunburst graph  Both visualizations are drawn using [the official tensorrt explorer](https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer)
@githubofhuo does the discrepancy come from the precision casting? fp16 AIT is supposed to have some nuanced differences compared to the ones generated from PyTorch fp32.
> thx, output image now is almost identical to diffusers result, although intermediate results have little discrepancy. Hi, I met a similar issue and couldn't find out why. Could you...
hi any update on this? Is there any chance triton causes the issue
Hi, could we presume that AIT should be compatible with the nightly build cutlass? It seems that AIT points to a specific fork of cutlass 2.10 in the third-party dependencies....
Is there any other way than to set the environment variable? I know it would work but we want to distribute the workloads to multiple processes/threads with each unique GPU...
Hi, wonder any follow-up on this? Besides cuda_visible_device, do we have any other methods to specify which GPU to use? Thanks!
Same question here. Thanks a lot.