pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

oneprof -q fails with error "ZE_RESULT_SUCCESS' failed"

Open mumar-intel opened this issue 2 years ago • 2 comments

I am using oneprof on one HPC+AI application with large number of kernels (~30). When I run: oneprof -q -o test.txt $APP_EXE It fails with error: oneprof/metric_query_collector.h:307: void MetricQueryCollector::ProcessQuery(const ZeQueryInfo&): Assertion `status == ZE_RESULT_SUCCESS' failed

It generates the output files (result.* data,* and test.txt) but the test.txt contains just the application total runtime and provides no information about the individual kernels.

I have tested it one tile, and one GPU. The application does not use MPI, it is a Python based code.

mumar-intel avatar May 25 '23 21:05 mumar-intel

@mumar-intel sorry for responding in such a delay. recently there were several fixes in oneprof. Can you please try the collection with the recent oneprof and tell if it still reproduced? thank you.

jfedorov avatar Dec 05 '23 09:12 jfedorov

hi, @jfedorov , i also run into this issue, and i updated to latest commit(9ee0e46cafa145856eaeeefe5f26ec046462300f), below is the error info, is it expected?

 pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQ
uery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: spr [Genuine Intel(R) CPU 0000%@]
Registry and code: 13 MB
Command: python test_linear.py
Uptime: 7.938176 s
Aborted (core dumped)

Wanzizhu avatar Jan 10 '24 14:01 Wanzizhu

@Wanzizhu Please use unitrace instead.

zma2 avatar Oct 30 '25 22:10 zma2