Valgrind fails on googlenet, inception, shufflenet, and squeezenet
C++ client, CPU model, VALGRIND 3.21.0 (latest) all versions of googlenet, inception, shufflenet, and squeezenet. Example from squeezenet1.0-12
valgrind --leak-check=yes --leak-check=full --show-leak-kinds=all --track-origins=yes /code/client/bin/modelzoo --iterations 1 --validate --msg-level INFO --file /model/squeezenet1.0-12.tests --lib /model/squeezenet1.0-12.so --fc-parms 0.01,0.0,1,10 --data-set-indices 0
==90== Memcheck, a memory error detector
==90== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==90== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==90== Command: /code/client/bin/modelzoo --iterations 1 --validate --msg-level INFO --file /model/squeezenet1.0-12.tests --lib /model/squeezenet1.0-12.so --fc-parms 0.01,0.0,1,10 --data-set-indices 0
==90==
Iteration 0 dataset 0: Running
==90== Invalid read of size 16
==90== at 0x509596C: main_graph (in /model/squeezenet1.0-12.so)
==90== by 0x5095945: main_graph (in /model/squeezenet1.0-12.so)
==90== Address 0x5d52c40 is 0 bytes after a block of size 193,616 alloc'd
==90== at 0x48382F0: malloc (vg_replace_malloc.c:431)
==90== by 0x5093CEF: main_graph (in /model/squeezenet1.0-12.so)
==90==
==90== Invalid read of size 16
==90== at 0x5095978: main_graph (in /model/squeezenet1.0-12.so)
==90== by 0x5095945: main_graph (in /model/squeezenet1.0-12.so)
==90== Address 0x5d52c60 is 32 bytes before a block of size 32 in arena "client"
==90==
==90== Invalid read of size 16
==90== at 0x509597E: main_graph (in /model/squeezenet1.0-12.so)
==90== by 0x5095945: main_graph (in /model/squeezenet1.0-12.so)
==90== Address 0x5d52c50 is 16 bytes after a block of size 193,616 alloc'd
==90== at 0x48382F0: malloc (vg_replace_malloc.c:431)
==90== by 0x5093CEF: main_graph (in /model/squeezenet1.0-12.so)
==90==
==90== Invalid read of size 16
==90== at 0x50996A0: main_graph (in /model/squeezenet1.0-12.so)
==90== by 0x5099679: main_graph (in /model/squeezenet1.0-12.so)
==90== Address 0x64879d0 is 0 bytes after a block of size 193,616 alloc'd
==90== at 0x48382F0: malloc (vg_replace_malloc.c:431)
==90== by 0x5097A23: main_graph (in /model/squeezenet1.0-12.so)
==90==
==90== Invalid read of size 16
==90== at 0x50996AC: main_graph (in /model/squeezenet1.0-12.so)
==90== by 0x5099679: main_graph (in /model/squeezenet1.0-12.so)
==90== Address 0x64879f0 is 32 bytes before an unallocated block of size 902,608 in arena "client"
...
==90==
==90== HEAP SUMMARY:
==90== in use at exit: 0 bytes in 0 blocks
==90== total heap usage: 23,411 allocs, 23,411 frees, 41,541,222 bytes allocated
==90==
==90== All heap blocks were freed -- no leaks are possible
==90==
==90== For lists of detected and suppressed errors, rerun with: -s
==90== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0)
On which machine, under which options?
The models were compiled on s390x for CPU with options --EmitLib --O3 --onnx-op-stats=TXT --mtriple=s390x-ibm-loz --mcpu=z14
Did this start happening after certain onnx-mlir commit?
AFAIK, this valgrind result was captured with the onnx-mlir commit SHA https://github.com/onnx/onnx-mlir/commit/e7dcf975f030183084a3771e6626ec19aaab7987
@gongsu832 We don't frequently run the valgrind tests because it takes an exceptional amount of time. Last time we ran it was probably near the 0.4.0 release so we can't easily limit the commit range beyond that.
Hi @gongsu832, just want to follow up. Has there been any updates on this?
No. I haven't had a chance. I will try to look at it in the next a couple of days.
I narrowed down the commit that starts the issue to https://github.com/onnx/onnx-mlir/commit/8e20096e0ffdbeb44adf5b9ea61c7a34e1842eaa. The commit right before that doesn't have valgrind issues with squeezenet1.0-12