ovunctuzel-bc

Results 5 comments of ovunctuzel-bc

I was able to resolve a similar issue by setting some layers in the attention block to FP32 precision. Might help with this case as well. I was able to...

A fairly standard pytorch training loop seems to work fine. The results are satisfactory but not quite at the level of the pretrained model.

I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5...

I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API ``` /backbone/stages.2/op_list.1/context_module/main/MatMul /backbone/stages.2/op_list.1/context_module/main/MatMul_1 /backbone/stages.2/op_list.1/context_module/main/Slice_5 /backbone/stages.2/op_list.1/context_module/main/Slice_4 /backbone/stages.2/op_list.1/context_module/main/Add /backbone/stages.2/op_list.1/context_module/main/Div ``` (Repeat for...

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs...