Sanjib comments

Results 10 comments of


                                            Sanjib

Interesting stat - Batch Processing Behavior

Export: `yolo export model=".\best.pt" format="torchscript" imgsz=1600 dynamic=False device=0 batch=12` I will test TensorRT, but why does it behave so negatively when batching? Forward pass: ``` std::vector inputs{ tensor_imgs }; if...

Interesting stat - Batch Processing Behavior

Tested with tensorRT. Got the following stats. Really curious why negative batching performance? GPU metrics show ~80%(avg) utilization with ~78%(avg) SM occupancy. | Batch Size | Forward Pass (ms/image) |...

Interesting stat - Batch Processing Behavior

Average. Dataset size: 248 images; iterate 10 times after 5 dummy warmup passes.

Interesting stat - Batch Processing Behavior

Following is the result of running TensorRT in half precision. | Batch Size (Images) | Forward Pass (ms/image) | Total Time (ms) | Throughput (img/s) | |------------|----------------------------|------------------|----------------------| | 1 |...

Interesting stat - Batch Processing Behavior

Thanks for your suggestions. I tested with `dynamic = false` as well and got the same performance. I will test the other suggestions too, but I need some time. I’m...

Interesting stat - Batch Processing Behavior

I used Nsight Compute. The following are the stats: Batch-1: DRAM bytes: 16.71 MB DRAM read: 12.69 MB DRAM write: 4.02 MB GPU time: 0.162 ms Bandwidth used: 16.71 MB...

Support cuda 12.4 in conda

windows, cuda12.4, c++

Support cuda 12.4 in conda

I did build too using cuda12.4 on windowns10, but needed patch a couple of files. It needs cleanup for windows supports.

Support cuda 12.4 in conda

@mnorris11 Done!

Patched to build on cuda12.4 on windows system

> Hey @Sanjib-ac, can you take a look at this again? Maybe update it or close it if you're no longer working on it. I’m not working on it anymore...