ai-hub-models icon indicating copy to clipboard operation
ai-hub-models copied to clipboard

[BUG] 3x Conv-3 Layers runs out of memory on Hub

Open mestrona-3 opened this issue 11 months ago • 1 comments

This bug is being filed based on the discussion with Manuel Kolmet in AI Hub Models slack community. https://qualcomm-ai-hub.slack.com/archives/C06LT6T3REY/p1709827335261829

Bug report: A fairly simply model (3x Conv-3 layers) runs out of memory when converted through the model hub but works fine out of the qnn-pytorch-converter. The model was created as below, TorchScript is attached. model = nn.Sequential( nn.Conv2d(1, 64, kernel_size=3, padding=1), nn.ReLU(), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(), nn.Conv2d(64, 1, kernel_size=3, padding=1), ) When running the model with qnn-net-run I get qnn-net-run pid:22832 WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring) WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring) Graph Finalize failure Our own runner tool which prints all debug output shows 2022-06-12 21:52:57.097 - E/QNN-RUNNER-CALLBACK: fa_alloc.cc:3747:ERROR:graph requires estimated allocation of 4176043 KB, limit is 2097152 KB

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: graph_prepare.cc:638:ERROR:error during serialize: memory usage too large

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: graph_prepare.cc:5512:ERROR:Serialize error: memory usage too large

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: <E> Weight Offset (0) + Weight data (0) sizes != total pickle size (712704) !!

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: <E> Error getting size and offsets of weights

2022-06-12 21:52:57.491 - E/QNN-RUNNER-CALLBACK: <E> Failed to initialize graph memory

2022-06-12 21:52:57.491 - E/QNN-RUNNER-CALLBACK: <E> Failed to finalize graph input_model with err: 6020

2022-06-12 21:52:57.491 - E/QNN-RUNNER-CALLBACK: <E> Failed to finalize graph (id: 1) with err 6020

2022-06-12 21:52:57.491 - E/QNN-RUNNER: Graph Finalize failure The job I've used to convert the model is here: https://app.aihub.qualcomm.com/jobs/jz5763ng3/

mestrona-3 avatar Mar 11 '24 22:03 mestrona-3

Manuel also mentioned he's using QNN 2.19

mestrona-3 avatar Mar 11 '24 22:03 mestrona-3

We have filed an internal issue for the right QNN team to investigate this. Closing the issue as we'll share on slack once it's been resolved.

mestrona-3 avatar Jun 17 '24 16:06 mestrona-3