ROCT-Thunk-Interface icon indicating copy to clipboard operation
ROCT-Thunk-Interface copied to clipboard

Impossible to use both ROCT and ROCR in single process

Open misos1 opened this issue 2 years ago • 5 comments

When is first used something from ROCR then hsaKmtAcquireSystemProperties returns error:

	hsa_init();
	hsa_iterate_agents([](hsa_agent_t agent, void *data)
	{
		char name[64] = {};
		hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, name);
		printf("%s\n", name);
		return HSA_STATUS_SUCCESS;
	}, NULL);

	HsaSystemProperties sp;
	printf("%i\n", hsaKmtOpenKFD());
	printf("%i\n", hsaKmtAcquireSystemProperties(&sp));
AMD Ryzen Threadripper
gfx900
gfx900
0
1

When is first used hsaKmtAcquireSystemProperties from ROCT then it is successful but ROCR does not list any agents:

	HsaSystemProperties sp;
	printf("%i\n", hsaKmtOpenKFD());
	printf("%i\n", hsaKmtAcquireSystemProperties(&sp));

	hsa_init();
	hsa_iterate_agents([](hsa_agent_t agent, void *data)
	{
		char name[64] = {};
		hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, name);
		printf("%s\n", name);
		return HSA_STATUS_SUCCESS;
	}, NULL);
0
0

As seems libhsakmt is static lib and libhsa-runtime64 is shared lib it contains its own copy of libhsakmt and these two somehow collide probably somewhere around hsaKmtAcquireSystemProperties.

Also seems without calling hsaKmtAcquireSystemProperties first it is not possible to call any function expecting NodeId because these need validate_nodeid which needs g_system initialised during hsaKmtAcquireSystemProperties.

misos1 avatar Nov 18 '22 16:11 misos1

The Thunk API always had trouble with multiple clients in the same process. When we made a static library we just made a choice that it is not useful as a public API. Applications should be using ROCr APIs. ROCr and a small number of low level tests should be the only clients of libhsakmt, and they should not be mixed in the same process.

fxkamd avatar Nov 18 '22 17:11 fxkamd

There are things which are not accessible through ROCR.

misos1 avatar Nov 18 '22 17:11 misos1

ROCr APIs tend to be a bit more abstracted. But all the functionality should be there.

fxkamd avatar Nov 18 '22 18:11 fxkamd

For example there is no possibility to set QueuePercentage to value other than 0 or 100 (if that is even possible with ROCR, I just searched for hsaKmtUpdateQueue and hsaKmtCreateQueue in ROCR sources).

misos1 avatar Nov 18 '22 19:11 misos1

KFD doesn't do anything with the percentage in practice, except 0% means, the queue is disabled, anything else means it is enabled.

When I say ROCr exposes all the functionality, I'm talking about useful functionality for applications. I'm not talking about exposing every low level detail of the KFD ioctl API or the sysfs topology, for example.

fxkamd avatar Nov 18 '22 19:11 fxkamd

@misos1 Is this ticket still relevant? If not, please close the ticket. Thanks!

ppanchad-amd avatar Jul 29 '24 16:07 ppanchad-amd

Note that we'll be merging ROCT into ROCr in the near future, so this type of situation won't be possible at that time. Closing off.

kentrussell avatar Jul 29 '24 17:07 kentrussell

Seems it was already fixed, now hsaKmtAcquireSystemProperties returns 0 in the first case and hsa_iterate_agents lists agents in the second case.

misos1 avatar Jul 29 '24 17:07 misos1