cpu_features icon indicating copy to clipboard operation
cpu_features copied to clipboard

ARM64 detection via asm

Open toor1245 opened this issue 3 years ago • 19 comments

Added an initial implementation, detecting ARM64 through assembly instructions, thereby getting rid of OS binding, which can be used, for example, to detect ARM on Windows.

What do you think?

toor1245 avatar Oct 24 '21 01:10 toor1245

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

gchatelet avatar Oct 25 '21 08:10 gchatelet

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

will the testing architecture be changed? since there are a lot of tests for x86.

toor1245 avatar Oct 25 '21 08:10 toor1245

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

will the testing architecture be changed? since there are a lot of tests for x86.

No only the code layout for now. It should be a non functional change.

gchatelet avatar Oct 26 '21 11:10 gchatelet

It's a nice addition, thx for the PR. That said, cpu_features codebase is starting to be a mess so I'd like to change the code layout. Can we keep it here for now and integrate once the layout is in a better shape?

will the testing architecture be changed? since there are a lot of tests for x86.

No only the code layout for now. It should be a non functional change.

I take that back, I might have to split the tests as well... I'll make sure it's the less disruptive possible.

gchatelet avatar Oct 26 '21 12:10 gchatelet

@gchatelet, @Mizux this pull request is ready for review.

Basically, I tested it on Linux with proc/cpuinfo and new implementation through inline asm and tested the behavior with full implementation when only inline asm:

Linux proc/cpuinfo + MIDR image

Linux DetectFeaturesBase + MIDR image

As you can see detection features the same, except evtstrm, since HWCAP_EVTSTRM is generic timer is configured to generate events at a frequency of approximately 100KHz, and we cannot get this info through cpuid like everyone else.

All other cpu features which we have in Aarch64Features struct was added. Doesnt work for Windows yet. Implementation wasn't added, since msvc compiler on Aarch64 __asm keyword is not supported, then I tried using armasm64 this is the same armasm but microsoft with some diference microsoft armasm directives in GNU assembler is used .inst directive, and for armasm DCI, but it doesn't work :( , it is an unknown directive.

Useful links for review: ARM64 ELF hwcaps Linux arm kernel patch Linux sysreg Linux cputype Linux cpu-feature-registers.txt Arm® Architecture Reference Manual Armv8, for A-profile architecture - D13.1 About the AArch64 System registers, pdf page 3040 Arm® Armv9-A Architecture Registers Arm A-Profile Architecture Specifications

Note: public API not changed

If you have the opportunity to check how it works on freebsd or macOs, I would be grateful

toor1245 avatar Nov 10 '21 00:11 toor1245

could you provide dev/aarch64 branch, where we can add CI/CD and if there are defects to fix them, improve tests and only then merge it into the master or is it unnecessary?

toor1245 avatar Nov 10 '21 09:11 toor1245

image checked via qemu and it works on freebsd

toor1245 avatar Nov 16 '21 01:11 toor1245

@toor1245 does it work on aarch64-darwin by any chance? I tried running it, it hits an "illegal instruction".

arkivm avatar Nov 17 '21 22:11 arkivm

@toor1245 does it work on aarch64-darwin by any chance? I tried running it, it hits an "illegal instruction".

Hi, thanks for the reply! I think no, for now. Right now I have no opportunity to test, and there is no CI to test yet. An issue has been created to add CI jobs for ios, https://github.com/google/cpu_features/issues/197, as a temporary solution can be done through sysctlbyname, but I would not like this since there will be a dependence on the OS, and lowers the possibility for further improving the information of a given architecture

toor1245 avatar Nov 18 '21 00:11 toor1245

Btw, if this gets merged, are we going to drop all the OS APIs (e.g., sysctl) for arm64 and switch to this version?

arkivm avatar Nov 18 '21 05:11 arkivm

Btw, if this gets merged, are we going to drop all the OS APIs (e.g., sysctl) for arm64 and switch to this version?

for Linux / Adnroid, something that has been working for a long time and does not break the public api, I think definitely not now, but on other OS this api may become truncated, what's not good

toor1245 avatar Nov 18 '21 11:11 toor1245

Since a new pull request was introduced, I converted this pull request to draft until it was merged https://github.com/google/cpu_features/pull/204 to the master branch

toor1245 avatar Nov 18 '21 12:11 toor1245

Thx for the work here. I'm keeping an eye on the PR (even though I haven't commented lately). Just to make sure we're on the same page, there should be only one header for aarch64 eventually. I assume you have the two headers in parallel so to check that the results are consistent?

gchatelet avatar Nov 30 '21 09:11 gchatelet

Thx for the work here. I'm keeping an eye on the PR (even though I haven't commented lately). Just to make sure we're on the same page, there should be only one header for aarch64 eventually. I assume you have the two headers in parallel so to check that the results are consistent?

@gchatelet at the moment, this PR has only one include/cpuinfo_aarch64.h header unchanged and added two headers to include/internal as include/internal/cpuid_x86.h, or do we need to move the implementation? Sorry, but I misunderstood the question of the consistency of the two headers.

I think we сan to create small pull-requests, since PR too big and hard to review, and leave this PR as a sample.

I see that we can create at least:

  1. Add new inl file stringize.inl, since often operantion and dependency of define_introspection.inl .
  2. Fix or add override function of ExctractBitRange uint64_t, as there is an overflow when extracting 64 bit info in aarch64.
  3. Add include/internal/cpuid_aarch64.h, src/define_cpuid_aarch64.inl and impl_aarch64__base_implementation.inl where only MIDR_EL1(architecture, revision, part num, variant) will be implemented.
  4. Provide new inline asm testing logic.
  5. Provide PR for each ID_* separately.

And as I mentioned above, it will be good if we create a dev branch for implementation.

What do you think?

toor1245 avatar Nov 30 '21 17:11 toor1245

Update

I have completed research on aarch64 support for Windows. Access to system registers is limited from EL0 and we can't get information from user mode, to solve this problem we have only one way, we have to write software driver in kernel mode, where we have more access, and make client communicate with driver.

here is the result of the test driver of load and unload Screenshot (8)

to read processor info we can use _ReadStatusReg, thus we don't have to think about armasm64 integration
https://reviews.llvm.org/D53115 https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170

in lib we have to add something like this:

HANDLE hDevice = CreateFile(L"\\\\.\\CpuFeatures", GENERIC_READ,
          0, NULL, OPEN_EXISTING, 0, NULL);

 if (hDevice == INVALID_HANDLE_VALUE)
      // handle hDevice == INVALID_HANDLE_VALUE

 EL1Info info;
 DWORD bytes;
 BOOL ok = ReadFile(hDevice, &info, sizeof(EL1Info), &bytes, NULL);
 if (!ok)
     // handle !ok
 if (bytes != sizeof(EL1Info))
     // handle bytes != sizeof(EL1Info)
 CloseHandle(hDevice);

Linux allows reading information from EL1 for the following system registers:

  • MIDR_EL1
  • ID_AA64ISAR0_EL1
  • ID_AA64PFR0_EL1
  • ID_AA64PFR1_EL1
  • ID_AA64ISAR1_EL1
  • ID_AA64MMFR0_EL1
  • ID_AA64MMFR2_EL1
  • ID_AA64ZFR0_EL1
  • ID_AA64MMFR1_EL1
  • ID_AA64ISAR2_EL1 https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html

@gchatelet, @Mizux, what do you think?

toor1245 avatar Feb 13 '22 17:02 toor1245

here is a sketch of the code, but don't mind the quality and reliability of this code as i have invested and experimented

CpuInfoBase.h

#include <ntddk.h>

void GetCpuInfo(PVOID Buffer);

CpuInfoAarch64.c

#if defined(_M_ARM64)
#include "CpuInfoBase.h"
#include "CpuInfoAarch64.h"
#include "arm64_neon.h"

// https://reviews.llvm.org/D53115
#define SYS_REG(op0, op1, crn, crm, op2) \
        ( ((op0 & 1) << 14) | \
          ((op1 & 7) << 11) | \
          ((crn & 15) << 7) | \
          ((crm & 15) << 3) | \
          ((op2 & 7) << 0) )

#define SYS_MIDR_EL1 SYS_REG(3, 0, 0, 0, 0)

void GetCpuInfo(PVOID Buffer) {
	UINT64 midr = _ReadStatusReg(SYS_MIDR_EL1);
	memcpy((PUINT64)Buffer, &midr, sizeof(UINT64));
}

#endif // defined(_M_ARM64)

Driver.c

#include <ntddk.h>
#include "CpuInfoBase.h"

void CpuFeaturesUnload(PDRIVER_OBJECT DriverObject)
{
	KdPrint(("cpu_features driver unload\n"));
	
	UNICODE_STRING symbolicLinkName = RTL_CONSTANT_STRING(L"\\??\\CpuFeatures");
	
	// Delete symbolic link.
	const NTSTATUS deleteSymbolicLinkStatus = IoDeleteSymbolicLink(&symbolicLinkName);
	if (!NT_SUCCESS(deleteSymbolicLinkStatus))
	{
		KdPrint(("Failed to delete symbolink link cpu_features_arm64. Status: (0x%08X)\n", deleteSymbolicLinkStatus));
	}
	IoDeleteDevice(DriverObject->DeviceObject);
}

NTSTATUS CpuFeaturesDispatchCreateClose(PDEVICE_OBJECT DriverObject, PIRP Irp)
{
	UNREFERENCED_PARAMETER(DriverObject);
	UNREFERENCED_PARAMETER(Irp);
	return STATUS_SUCCESS;
}

NTSTATUS CompleteIrp(PIRP Irp, NTSTATUS status, ULONG_PTR info )
{
	Irp->IoStatus.Status = status;
	Irp->IoStatus.Information = info;
	IoCompleteRequest(Irp, 0);
	return status;
}

NTSTATUS CpuFeaturesDispatchRead(PDEVICE_OBJECT DeviceObject, PIRP Irp)
{
	UNREFERENCED_PARAMETER(DeviceObject);
	PIO_STACK_LOCATION pIoStackLocation = IoGetCurrentIrpStackLocation(Irp);
	ULONG requestLength = 0;
	PVOID userBuffer = NULL;
	requestLength = pIoStackLocation->Parameters.Read.Length;
	if (requestLength == 0) 
	{
		return CompleteIrp(Irp, STATUS_INVALID_BUFFER_SIZE, 0);
	}
		
	userBuffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddress, NormalPagePriority);
	if (!userBuffer)
	{
		return CompleteIrp(Irp, STATUS_INVALID_BUFFER_SIZE, 0);
	}

	GetCpuInfo(userBuffer);
		
	return CompleteIrp(Irp, STATUS_SUCCESS, requestLength);
}

UINT64 ExtractBitRange(UINT64 Reg, UINT64 Msb,
	UINT64 lsb) {
	const UINT64 bits = Msb - lsb + 1ULL;
	const UINT64 mask = (1ULL << bits) - 1ULL;
	return Reg >> lsb & mask;
}

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)
{
	UNREFERENCED_PARAMETER(RegistryPath);

	// Set an unload routine.
	DriverObject->DriverUnload = CpuFeaturesUnload;

	// Set dispatch routines the driver support.
	DriverObject->MajorFunction[IRP_MJ_CREATE] = CpuFeaturesDispatchCreateClose;
	DriverObject->MajorFunction[IRP_MJ_CLOSE] = CpuFeaturesDispatchCreateClose;
	DriverObject->MajorFunction[IRP_MJ_READ] = CpuFeaturesDispatchRead;
	
	UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\CpuFeatures");
	UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\CpuFeatures");
	PDEVICE_OBJECT DeviceObject = NULL;
	NTSTATUS status = STATUS_SUCCESS;
	BOOLEAN symLinkCreated = FALSE;

// just midr output 
#if defined(_M_ARM64)
	KdPrint(("cpu_features aarch64 working!!!\n"));
	UINT64 value;
	GetCpuInfo(&value);
	KdPrint(("%d\n", value));
	UINT64 impl = ExtractBitRange(value, 31, 24);
	UINT64 variant = ExtractBitRange(value, 23, 20);
	UINT64 part = ExtractBitRange(value, 15, 4);
	UINT64 revision = ExtractBitRange(value, 3, 0);
	KdPrint(("part:	   %d\n", part));
	KdPrint(("variant:     %d\n", variant));
	KdPrint(("revision:    %d\n", revision));
	KdPrint(("implementer: %d\n", impl));
#endif

	do {
		status = IoCreateDevice(DriverObject, 0, &devName, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
		if (!NT_SUCCESS(status)) {
			KdPrint(("failed to create device (0x%08X)\n", status));
			break;
		}
		DeviceObject->Flags |= DO_DIRECT_IO;

		status = IoCreateSymbolicLink(&symLink, &devName);
		if (!NT_SUCCESS(status)) {
			KdPrint(("failed to create symbolic link (0x%08X)\n", status));
			break;
		}
		symLinkCreated = TRUE;

	} while (FALSE);

	if (!NT_SUCCESS(status)) {
		if (symLinkCreated)
			IoDeleteSymbolicLink(&symLink);
		if (DeviceObject)
			IoDeleteDevice(DeviceObject);
	}

	return status;
}

The biggest disadvantage of this approach is if something is not processed, there will be a blue screen of death

toor1245 avatar Feb 13 '22 17:02 toor1245

hi @arkivm, could you please run this code on your M1 to make sure the behavior with EL on macOS is the same as it is on Windows

compiler clang. (if possible on macOS _ReadStatusReg) https://reviews.llvm.org/D53115

#include <stdint.h>
#include <intrin.h>

#define ARM64_SYSREG(op0, op1, crn, crm, op2) \
        ( ((op0 & 1) << 14) | \
          ((op1 & 7) << 11) | \
          ((crn & 15) << 7) | \
          ((crm & 15) << 3) | \
          ((op2 & 7) << 0) )

#define ARM64_CNTVCT            ARM64_SYSREG(3, 3, 14, 0, 2)

int main() {
   uint64_t cntvct = _ReadStatusReg(ARM64_CNTVCT); // should work
   return 0;
}
#include <stdint.h>
#include <intrin.h>

#define ARM64_SYSREG(op0, op1, crn, crm, op2) \
        ( ((op0 & 1) << 14) | \
          ((op1 & 7) << 11) | \
          ((crn & 15) << 7) | \
          ((crm & 15) << 3) | \
          ((op2 & 7) << 0) )

#define ARM64_MIDR_EL1         ARM64_SYSREG(3, 0, 0, 0, 0)

int main() {
   uint64_t midr = _ReadStatusReg(ARM64_MIDR_EL1); // the probability that it will fail the way it is done in user mode
   return 0;
}

if _ReadStautsReg is not supported for macOS

void** tpidrro;
__asm__ volatile ("mrs %0, tpidrro_el0" : "=r" (tpidrro)); // should work
uint64_t midr;
__asm__ volatile ("mrs %0, midr_el1" : "=r" (midr)); // the probability that it will fail the way it is done in user mode

toor1245 avatar Feb 14 '22 17:02 toor1245

@toor1245

__asm__ volatile ("mrs %0, cntvct_el0" : "=r" (cntvct1)); // works
__asm__ volatile ("mrs %0, tpidrro_el0" : "=r" (tpidr));  // works
__asm__ volatile ("mrs %0, midr_el1" : "=r" (midr)); // fails - illegal instruction

I couldn't get the_ReadStatusReg intrinsic to work. When I add intrin.h, it was complaining

include/intrin.h:12:15: fatal error: 'intrin.h' file not found
#include_next <intrin.h>
              ^~~~~~~~~~

Let me know if you want me to test something else.

arkivm avatar Feb 19 '22 20:02 arkivm

@arkivm, thanks for the help!

toor1245 avatar Feb 22 '22 21:02 toor1245