ovunctuzel-bc
ovunctuzel-bc
I was able to resolve a similar issue by setting some layers in the attention block to FP32 precision. Might help with this case as well. I was able to...
A fairly standard pytorch training loop seems to work fine. The results are satisfactory but not quite at the level of the pretrained model.
I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5...
I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API ``` /backbone/stages.2/op_list.1/context_module/main/MatMul /backbone/stages.2/op_list.1/context_module/main/MatMul_1 /backbone/stages.2/op_list.1/context_module/main/Slice_5 /backbone/stages.2/op_list.1/context_module/main/Slice_4 /backbone/stages.2/op_list.1/context_module/main/Add /backbone/stages.2/op_list.1/context_module/main/Div ``` (Repeat for...
I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs...