Joosep Pata

Results 115 comments of Joosep Pata

In 7a301da I got the MultiheadAttention OP spliced into the graph using the onnxscript and torch.onnx.export (TorchScript) approach.

In e40f5c3 the ONNX export now works with dynamic shapes, f32/fp16, using com.microsoft.MultiheadAttention (that can use Flash Attention on GPU), and the pytorch and onnx versions return the same values....

Actually ONLY the MultiHeadAttention op needs to run in fp16: ``` @onnxscript.script(custom_opset) def SDPA( query: TFloat, key: TFloat, value: TFloat, ) -> TFloat: # Unlike pytorch scaled_dot_product_attention, # the input...

Importing the new model in CMSSW still todo. Need #323 merged and some results from it first and then do the CMSSW updates on top of https://github.com/jpata/cmssw/releases/tag/pfanalysis_caloparticle_CMSSW_14_1_0_pre3_acat2022.

The required changes in the CMSSW side to import the new ONNX model are here: https://github.com/jpata/cmssw/commit/3d5455b8fa310af1cb7aa5ee8d0426f9b4353f84 It runs and produces nonzero/nongarbage outputs. Submitted jobs on CPU, will see validations soon....

Here I managed to make the CMSSW ONNX GPU inference work, I think: https://github.com/jpata/cmssw/commit/36be715fa00457c310acae3c033f4788bd47a26b ``` CPU PF: log_cpu_pf.txt:TimeModule> 35002 1 particleFlowTmp PFProducer 0.00893436 log_cpu_pf.txt:TimeModule> 35005 1 particleFlowTmp PFProducer 0.00696006 log_cpu_pf.txt:TimeModule>...

All done, moved to CMSSW_14 and updated the C++ inference code.

A simpler example is prepared here: https://github.com/Graph-COM/HEPT/tree/main/example

`torch_cluster` comes in via `torch_geometric`. We could get rid of the `torch_geometric` dependency by removing the GravNet code and always just doing 3D padding of the model: https://github.com/jpata/particleflow/blob/main/mlpf/pyg/PFDataset.py#L131. Technically we...

FYI in this PR I removed the torch_geometric dependency: https://github.com/jpata/particleflow/pull/310