CMS inference with GPU

Open jpata opened this issue 3 years ago • 1 comments

[x] get the basic model running on GPU with ONNXRuntime: https://github.com/jpata/cmssw/commit/36be715fa00457c310acae3c033f4788bd47a26b
[ ] run the model efficiently, with async executing and batching through e.g. SONIC:
- update this code to CMSSW_14: https://github.com/jpata/cmssw/pull/65 and integrate with https://github.com/jpata/cmssw/tree/pfanalysis_caloparticle_CMSSW_14_1_0_pre3
- Latest SONIC info from here: https://indico.cern.ch/event/1412058/contributions/5935100/attachments/2864635/5013510/SONIC%20ML%20Production.pdf
[ ] do an apples-to-apples throughput comparison of PF vs MLPF, given a fully loaded machine (CPU+GPU).

Apr 07 '22 11:04 jpata

The current timing on a machine with 1xA100 is as follows:

This is using the model mlpf_21M_attn2x6x512_bs40_relu_tt_qcd_zh400k_checkpoint25_1xa100_fp32_fused.onnx and 10 events in a single thread. Note that this is not loading the GPU very efficiently: batch size 1, no async calls.

May 28 '24 11:05 jpata