cpuinfo icon indicating copy to clipboard operation
cpuinfo copied to clipboard

Parse cpu a76 core as a55 result in serious performance problem

Open xiaotongnii opened this issue 2 years ago • 5 comments

Platform:android_arm64-android13 SOC: cortex A 4a55 and 4a76 Cpuinfo:core_siblings_list:0-7 HW:design 4a55 and 4a76 as a cluster on SOC and configure dtsi cpu 8core as a cluster.

Problem:Use benchmark tools test CPU performance,find cpuinfo parser cpu a76 core as a55 result in serious performance problem on Arm CPU

The cpuinfo call the main function as the following: https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/init.c#L34C37-L34C37

Use processors[0].package_leader_id update processors[sibling].package_leader_id(siblings value form 0 to 7). Result in processors[i].package_leader_id value is 0.(i form 0 to 7)

//init.c:34
static bool cluster_siblings_parser(
	uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
	struct cpuinfo_arm_linux_processor* processors)
{
	for(int i = 0; i < 8; i++)
	   {cpuinfo_log_debug("init.c.39.processors[i].package_leader_id:%d, ",processors[i].package_leader_id);}
	processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
	uint32_t package_leader_id = processors[processor].package_leader_id;
    cpuinfo_log_debug("File: %s, Function: %s, Line: %d, processor:%d, package_leader_id:%d,processors[processor].package_leader_id:%d,siblings_start:%d, siblings_end:%d\n", __FILE__, __func__, __LINE__,processor, package_leader_id,processors[processor].package_leader_id,siblings_start,siblings_end);

	for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
		if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
			cpuinfo_log_info("invalid processor %"PRIu32" reported as a sibling for processor %"PRIu32,
				sibling, processor);
			continue;
		}

		const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
		if (sibling_package_leader_id < package_leader_id) {
			package_leader_id = sibling_package_leader_id;
		}
		// use processors[0].package_leader_id update processors[sibling].package_leader_id.
               //  processors[sibling].package_leader_id value is 0.
		processors[sibling].package_leader_id = package_leader_id;
		processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
		cpuinfo_log_debug("File: %s, Function: %s, Line: %d, sibling:%d, sibling_package_leader_id:%d, package_leader_id:%d\n", __FILE__, __func__, __LINE__,sibling,sibling_package_leader_id,package_leader_id);

	}

	processors[processor].package_leader_id = package_leader_id;

	return true;
}

Due to processors[processor].package_leader_id (processor from 0 to 7) is 0, cluster_leader value is 0 all the time. https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/init.c#L368

	/* Initialize core vendor, uarch, MIDR, and frequency for every logical processor */
	for (uint32_t i = 0; i < arm_linux_processors_count; i++) {
		if (bitmask_all(arm_linux_processors[i].flags, CPUINFO_LINUX_FLAG_VALID)) {
			const uint32_t cluster_leader = arm_linux_processors[i].package_leader_id;
			if (cluster_leader == i) {
				/* Cluster leader: decode core vendor and uarch */
				cpuinfo_arm_decode_vendor_uarch(
				arm_linux_processors[cluster_leader].midr,
#if CPUINFO_ARCH_ARM
				!!(arm_linux_processors[cluster_leader].features & CPUINFO_ARM_LINUX_FEATURE_VFPV4),
#endif
				&arm_linux_processors[cluster_leader].vendor,
				&arm_linux_processors[cluster_leader].uarch);
			} else {
				/* Cluster non-leader: copy vendor, uarch, MIDR, and frequency from cluster leader */
				arm_linux_processors[i].flags |= arm_linux_processors[cluster_leader].flags &
					(CPUINFO_ARM_LINUX_VALID_MIDR | CPUINFO_LINUX_FLAG_MAX_FREQUENCY);
				arm_linux_processors[i].midr = arm_linux_processors[cluster_leader].midr;
				arm_linux_processors[i].vendor = arm_linux_processors[cluster_leader].vendor;
				arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;
				arm_linux_processors[i].max_frequency = arm_linux_processors[cluster_leader].max_frequency;
			}
		}
	}

https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/init.c#L383 arm_linux_processors[i].uarch = arm_linux_processors[cluster_leader].uarch;

Use arm_linux_processors[0].uarch update arm_linux_processors[i].uarch(i from o0 to 7),Reulst in

arm_linux_processors[i].uarch is the same as update arm_linux_processors[0].uarch which is a55(0x00300355 in the cpuinfo.h)

https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/include/cpuinfo.h#L409

Please give me some advices. BR. Xiaotong

xiaotongnii avatar Oct 25 '23 02:10 xiaotongnii

Nobody reply?OK Now,Cpuinfo should use logical cpu cluster device node to parse cpu cluster rather than use physical cpu cluster!!! We nedd to add a new cpuinfo device node to parse logical cpu cluster,Such as package_leader_id and package_leader_id_list or logical_core_siblings_list 。 https://github.com/pytorch/cpuinfo/blob/76d5e8f5b563daa65340a60fce0e9aec73a936df/src/arm/linux/api.h#L189

We can not use a physical cpu cluster device node to parse directly ,because cpu arch is changing all the time. BR. Xiaotong

xiaotongnii avatar Nov 02 '23 02:11 xiaotongnii

I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...

malfet avatar Nov 14 '23 19:11 malfet

I think some of that is being added as part of RISC-V bringup, but I wonder if it could be landed separately...

Hi,malfet, what is means about " I wonder if it could be landed separately". For Arm CPUs,I am willing to slove the issue.And RISV-V CPUS also has a similar programe. How to make the issue become a PR?

xiaotongnii avatar Nov 23 '23 05:11 xiaotongnii

I suspect we're referring to PR #190 here, where we added the concept of core_leader_id and cluster_leader_id to the RISC-V structure definition.

In the RISC-V implementation, we tie the processor's uarch to the core_leader. This seems to be essentially the problem you're flagging on ARM - logical processors on the same physical core are guaranteed to have one uarch, but multiple physical cores tied to the same cluster don't necessarily have to be the same uarch.

I think what malfet@ meant by 'it could be landed separately' is whether #190 might somehow block you, because 2 weeks ago at the comment, it was not yet landed. However, these paths are completely independent, so you could put up a fix for this in ARM without worrying about RISC-V.

In regards to how to make this issue a PR, just create a PR separately and follow this guide to link your PR to this issue.

prashanthswami avatar Nov 27 '23 20:11 prashanthswami

Application: uint32_t uarch_index = cpuinfo_get_current_uarch_index();-> const struct cpuinfo_uarch_info* cpuinfo_uarch_info = cpuinfo_get_uarch(uarch_index); (cpuinfo_uarchs,init->fill uarchs in it)-> cpuinfo_uarch_info->uarch ;(Type of CPU microarchitecture )

Cpuinfo: sibling_package_leader_id ->package_leader_id -> processors[sibling].package_leader_id

cpuinfo_arm_linux_init->cpuinfo_linux_detect_core_siblings->cluster_siblings_parser

static bool cluster_siblings_parser(
	uint32_t processor, uint32_t siblings_start, uint32_t siblings_end,
	struct cpuinfo_arm_linux_processor* processors)
{
	processors[processor].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;
	uint32_t package_leader_id = processors[processor].package_leader_id;

	for (uint32_t sibling = siblings_start; sibling < siblings_end; sibling++) {
		if (!bitmask_all(processors[sibling].flags, CPUINFO_LINUX_FLAG_VALID)) {
			continue;
		}
		const uint32_t sibling_package_leader_id = processors[sibling].package_leader_id;
		if (sibling_package_leader_id < package_leader_id) {
			// package_leader_id = sibling_package_leader_id;
		}
		// processors[sibling].package_leader_id = package_leader_id;
		processors[sibling].flags |= CPUINFO_LINUX_FLAG_PACKAGE_CLUSTER;

	}

	processors[processor].package_leader_id = package_leader_id;

	return true;
}

xiaotongnii avatar Dec 23 '23 07:12 xiaotongnii