Handle more than 64 registers - Part 1
The general feedback for https://github.com/dotnet/runtime/pull/98258 was to come up with smaller PRs concentrated around LSRA. This is part 1 of that.
For Arm64, this PR changes the typedef unsigned __int64 regMaskTP to `typedef
typedef struct _regMaskTP
{
unsigned __int64 low;
} regMaskTP;
A version of PopCount and BitOperations has been added next to regMaskTP struct definition.
Most of the method implementation is pulled from https://github.com/dotnet/runtime/pull/96196.
@dotnet/jit-contrib
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
This is expected to be zero TP diff, so I will investigate why it is causing 2% regression. Possibly I missed updating something:
for windows/arm64 crossgen2 collection, here is the distribution. Will take a look
??$select@$0A@@RegisterSelection@LinearScan@@QEAA?AU_regMaskAll@@PEAVInterval@@PEAVRefPosition@@@Z : 7371109602 : NA : 24.12% : +4.3652%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@U_regMaskAll@@I@Z : 2298414560 : NA : 7.52% : +1.3611%
?processKills@LinearScan@@AEAAXPEAVRefPosition@@@Z : 1228183017 : NA : 4.02% : +0.7273%
?BuildUse@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@U_regMaskAll@@H@Z : 976319605 : NA : 3.19% : +0.5782%
?mergeRegisterPreferences@Interval@@QEAAXU_regMaskAll@@@Z : 732180225 : NA : 2.40% : +0.4336%
?BuildDef@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@U_regMaskAll@@H@Z : 712437498 : NA : 2.33% : +0.4219%
?freeRegisters@LinearScan@@AEAAXU_regMaskAll@@@Z : 598599174 : NA : 1.96% : +0.3545%
?buildKillPositionsForNode@LinearScan@@AEAA_NPEAUGenTree@@IU_regMaskAll@@@Z : 419647064 : NA : 1.37% : +0.2485%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@W4_regNumber_enum@@IW4RefType@@PEAUGenTree@@U_regMaskAll@@@Z : 388348628 : NA : 1.27% : +0.2300%
?associateRefPosWithInterval@LinearScan@@AEAAXPEAVRefPosition@@@Z : 256572045 : +23.46% : 0.84% : +0.1519%
?gtGetRegMask@GenTree@@QEBA?AU_regMaskAll@@XZ : 247446771 : NA : 0.81% : +0.1465%
?emitIns_Call@emitter@@QEAAXW4EmitCallType@1@PEAUCORINFO_METHOD_STRUCT_@@PEAX_JW4emitAttr@@4AEBQEA_KU_regMaskAll@@6AEBVDebugInfo@@W4_regNumber_enum@@8I3_N@Z : 229060387 : NA : 0.75% : +0.1357%
?addKillForRegs@LinearScan@@AEAAXU_regMaskAll@@I@Z : 103418880 : NA : 0.34% : +0.0612%
?resolveEdge@LinearScan@@QEAAXPEAUBasicBlock@@0W4ResolveType@1@AEBQEA_KU_regMaskAll@@@Z : 92569359 : NA : 0.30% : +0.0548%
?BuildDefs@LinearScan@@AEAAXPEAUGenTree@@HU_regMaskAll@@@Z : 85318812 : NA : 0.28% : +0.0505%
?genBuildRegPairsStack@CodeGen@@KAXU_regMaskAll@@PEAV?$ArrayStack@URegPair@CodeGen@@@@@Z : 75277373 : NA : 0.25% : +0.0446%
?emitEncodeCallGCregs@emitter@@CAXU_regMaskAll@@PEAUinstrDesc@1@@Z : 73781054 : NA : 0.24% : +0.0437%
?BuildOperandUses@LinearScan@@AEAAHPEAUGenTree@@U_regMaskAll@@@Z : 68804619 : NA : 0.23% : +0.0407%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z : 54306503 : +4.65% : 0.18% : +0.0322%
?emitAddLabel@emitter@@AEAAPEAXAEBQEA_KU_regMaskAll@@1@Z : 51155804 : NA : 0.17% : +0.0303%
?assignPhysReg@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z : 50356789 : +10.96% : 0.16% : +0.0298%
?emitCreatePlaceholderIG@emitter@@QEAAXW4insGroupPlaceholderType@@PEAUBasicBlock@@AEBQEA_KU_regMaskAll@@3_N@Z : 46678609 : NA : 0.15% : +0.0276%
?BuildAddrUses@LinearScan@@AEAAHPEAUGenTree@@U_regMaskAll@@@Z : 46376391 : NA : 0.15% : +0.0275%
?emitGCregDeadSet@emitter@@QEAAXW4GCtype@@U_regMaskAll@@PEAE@Z : 37957388 : NA : 0.12% : +0.0225%
?getMatchingConstants@LinearScan@@AEAA?AU_regMaskAll@@U2@PEAVInterval@@PEAVRefPosition@@@Z : 36052458 : NA : 0.12% : +0.0214%
??$processBlockEndAllocation@$00@LinearScan@@AEAAXPEAUBasicBlock@@@Z : -31931508 : -99.99% : 0.10% : -0.0189%
?emitGCregDeadSet@emitter@@QEAAXW4GCtype@@_KPEAE@Z : -37957388 : -100.00% : 0.12% : -0.0225%
?getMatchingConstants@LinearScan@@AEAA_K_KPEAVInterval@@PEAVRefPosition@@@Z : -40340140 : -100.00% : 0.13% : -0.0239%
?emitCreatePlaceholderIG@emitter@@QEAAXW4insGroupPlaceholderType@@PEAUBasicBlock@@AEBQEA_K_K3_N@Z : -45166929 : -100.00% : 0.15% : -0.0267%
?BuildAddrUses@LinearScan@@AEAAHPEAUGenTree@@_K@Z : -46376391 : -100.00% : 0.15% : -0.0275%
?emitAddLabel@emitter@@AEAAPEAXAEBQEA_K_K1@Z : -51155804 : -100.00% : 0.17% : -0.0303%
?updateAssignedInterval@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z : -51531940 : -7.61% : 0.17% : -0.0305%
?buildUpperVectorRestoreRefPosition@LinearScan@@AEAAXPEAVInterval@@IPEAUGenTree@@_NI@Z : -62330666 : -100.00% : 0.20% : -0.0369%
?BuildOperandUses@LinearScan@@AEAAHPEAUGenTree@@_K@Z : -68804619 : -100.00% : 0.23% : -0.0407%
?addKillForRegs@LinearScan@@AEAAX_KI@Z : -70669568 : -100.00% : 0.23% : -0.0419%
?emitEncodeCallGCregs@emitter@@CAX_KPEAUinstrDesc@1@@Z : -74971071 : -100.00% : 0.25% : -0.0444%
?genBuildRegPairsStack@CodeGen@@KAX_KPEAV?$ArrayStack@URegPair@CodeGen@@@@@Z : -75003566 : -100.00% : 0.25% : -0.0444%
?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z : -80628252 : -5.11% : 0.26% : -0.0477%
?BuildDefs@LinearScan@@AEAAXPEAUGenTree@@H_K@Z : -85318812 : -100.00% : 0.28% : -0.0505%
?resolveEdge@LinearScan@@QEAAXPEAUBasicBlock@@0W4ResolveType@1@AEBQEA_K_K@Z : -91554003 : -100.00% : 0.30% : -0.0542%
?gtGetRegMask@GenTree@@QEBA_KXZ : -181640807 : -100.00% : 0.59% : -0.1076%
?emitIns_Call@emitter@@QEAAXW4EmitCallType@1@PEAUCORINFO_METHOD_STRUCT_@@PEAX_JW4emitAttr@@4AEBQEA_K_K6AEBVDebugInfo@@W4_regNumber_enum@@8I3_N@Z : -229060387 : -100.00% : 0.75% : -0.1357%
?updateRegisterPreferences@Interval@@QEAAX_K@Z : -240144624 : -100.00% : 0.79% : -0.1422%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@W4_regNumber_enum@@IW4RefType@@PEAUGenTree@@_K@Z : -374957296 : -100.00% : 1.23% : -0.2221%
?buildKillPositionsForNode@LinearScan@@AEAA_NPEAUGenTree@@I_K@Z : -390519208 : -100.00% : 1.28% : -0.2313%
?freeRegisters@LinearScan@@AEAAX_K@Z : -598599174 : -100.00% : 1.96% : -0.3545%
?BuildDef@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@_KH@Z : -712437498 : -100.00% : 2.33% : -0.4219%
?BuildUse@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@_KH@Z : -927590191 : -100.00% : 3.04% : -0.5493%
??$allocateRegisters@$0A@@LinearScan@@QEAAXXZ : -1375837725 : -19.43% : 4.50% : -0.8148%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@_KI@Z : -1693247150 : -100.00% : 5.54% : -1.0028%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z : -6078036133 : -100.00% : 19.89% : -3.5995%
The culprit was using PopCount() in genMaxOneBit() and genExactlyOneBit(). After fixing it, the regression drops to 0.5%. The remaining regression is just scattered around because of various factors and is not related to any specific pattern.
Base: 168859757414, Diff: 169865447753, +0.5956%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA?AUregMaskTP@@PEAVInterval@@PEAVRefPosition@@@Z : 6583187619 : NA : 22.90% : +3.8986%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@UregMaskTP@@I@Z : 1760063728 : NA : 6.12% : +1.0423%
?processKills@LinearScan@@AEAAXPEAVRefPosition@@@Z : 1228183017 : NA : 4.27% : +0.7273%
?BuildUse@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@UregMaskTP@@H@Z : 930438698 : NA : 3.24% : +0.5510%
?BuildDef@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@UregMaskTP@@H@Z : 712437498 : NA : 2.48% : +0.4219%
?mergeRegisterPreferences@Interval@@QEAAXUregMaskTP@@@Z : 711153378 : NA : 2.47% : +0.4212%
?freeRegisters@LinearScan@@AEAAXUregMaskTP@@@Z : 598599174 : NA : 2.08% : +0.3545%
?buildKillPositionsForNode@LinearScan@@AEAA_NPEAUGenTree@@IUregMaskTP@@@Z : 419548284 : NA : 1.46% : +0.2485%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@W4_regNumber_enum@@IW4RefType@@PEAUGenTree@@UregMaskTP@@@Z : 388348628 : NA : 1.35% : +0.2300%
?emitIns_Call@emitter@@QEAAXW4EmitCallType@1@PEAUCORINFO_METHOD_STRUCT_@@PEAX_JW4emitAttr@@4AEBQEA_KUregMaskTP@@6AEBVDebugInfo@@W4_regNumber_enum@@8I3_N@Z : 229060387 : NA : 0.80% : +0.1357%
?gtGetRegMask@GenTree@@QEBA?AUregMaskTP@@XZ : 191014321 : NA : 0.66% : +0.1131%
?resolveEdge@LinearScan@@QEAAXPEAUBasicBlock@@0W4ResolveType@1@AEBQEA_KUregMaskTP@@@Z : 92569359 : NA : 0.32% : +0.0548%
?BuildDefs@LinearScan@@AEAAXPEAUGenTree@@HUregMaskTP@@@Z : 85318812 : NA : 0.30% : +0.0505%
?genBuildRegPairsStack@CodeGen@@KAXUregMaskTP@@PEAV?$ArrayStack@URegPair@CodeGen@@@@@Z : 75277373 : NA : 0.26% : +0.0446%
?associateRefPosWithInterval@LinearScan@@AEAAXPEAVRefPosition@@@Z : 74404320 : +6.80% : 0.26% : +0.0441%
?emitEncodeCallGCregs@emitter@@CAXUregMaskTP@@PEAUinstrDesc@1@@Z : 73781054 : NA : 0.26% : +0.0437%
?addKillForRegs@LinearScan@@AEAAXUregMaskTP@@I@Z : 70669568 : NA : 0.25% : +0.0419%
?BuildOperandUses@LinearScan@@AEAAHPEAUGenTree@@UregMaskTP@@@Z : 68804619 : NA : 0.24% : +0.0407%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z : 54306503 : +4.65% : 0.19% : +0.0322%
?emitAddLabel@emitter@@AEAAPEAXAEBQEA_KUregMaskTP@@1@Z : 51155804 : NA : 0.18% : +0.0303%
?assignPhysReg@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z : 50356789 : +10.96% : 0.18% : +0.0298%
?BuildAddrUses@LinearScan@@AEAAHPEAUGenTree@@UregMaskTP@@@Z : 46376391 : NA : 0.16% : +0.0275%
?emitCreatePlaceholderIG@emitter@@QEAAXW4insGroupPlaceholderType@@PEAUBasicBlock@@AEBQEA_KUregMaskTP@@3_N@Z : 45166929 : NA : 0.16% : +0.0267%
?emitGCregDeadSet@emitter@@QEAAXW4GCtype@@UregMaskTP@@PEAE@Z : 37957388 : NA : 0.13% : +0.0225%
?getMatchingConstants@LinearScan@@AEAA?AUregMaskTP@@U2@PEAVInterval@@PEAVRefPosition@@@Z : 36052458 : NA : 0.13% : +0.0214%
??$processBlockEndAllocation@$00@LinearScan@@AEAAXPEAUBasicBlock@@@Z : -31931508 : -99.99% : 0.11% : -0.0189%
?emitGCregDeadSet@emitter@@QEAAXW4GCtype@@_KPEAE@Z : -37957388 : -100.00% : 0.13% : -0.0225%
?getMatchingConstants@LinearScan@@AEAA_K_KPEAVInterval@@PEAVRefPosition@@@Z : -40340140 : -100.00% : 0.14% : -0.0239%
?emitCreatePlaceholderIG@emitter@@QEAAXW4insGroupPlaceholderType@@PEAUBasicBlock@@AEBQEA_K_K3_N@Z : -45166929 : -100.00% : 0.16% : -0.0267%
?BuildAddrUses@LinearScan@@AEAAHPEAUGenTree@@_K@Z : -46376391 : -100.00% : 0.16% : -0.0275%
?emitAddLabel@emitter@@AEAAPEAXAEBQEA_K_K1@Z : -51155804 : -100.00% : 0.18% : -0.0303%
?updateAssignedInterval@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z : -51531940 : -7.61% : 0.18% : -0.0305%
?BuildOperandUses@LinearScan@@AEAAHPEAUGenTree@@_K@Z : -68804619 : -100.00% : 0.24% : -0.0407%
?addKillForRegs@LinearScan@@AEAAX_KI@Z : -70669568 : -100.00% : 0.25% : -0.0419%
?emitEncodeCallGCregs@emitter@@CAX_KPEAUinstrDesc@1@@Z : -74971071 : -100.00% : 0.26% : -0.0444%
?genBuildRegPairsStack@CodeGen@@KAX_KPEAV?$ArrayStack@URegPair@CodeGen@@@@@Z : -75003566 : -100.00% : 0.26% : -0.0444%
?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z : -80628252 : -5.11% : 0.28% : -0.0477%
?BuildDefs@LinearScan@@AEAAXPEAUGenTree@@H_K@Z : -85318812 : -100.00% : 0.30% : -0.0505%
?resolveEdge@LinearScan@@QEAAXPEAUBasicBlock@@0W4ResolveType@1@AEBQEA_K_K@Z : -91554003 : -100.00% : 0.32% : -0.0542%
?gtGetRegMask@GenTree@@QEBA_KXZ : -181640807 : -100.00% : 0.63% : -0.1076%
?emitIns_Call@emitter@@QEAAXW4EmitCallType@1@PEAUCORINFO_METHOD_STRUCT_@@PEAX_JW4emitAttr@@4AEBQEA_K_K6AEBVDebugInfo@@W4_regNumber_enum@@8I3_N@Z : -229060387 : -100.00% : 0.80% : -0.1357%
?updateRegisterPreferences@Interval@@QEAAX_K@Z : -240144624 : -100.00% : 0.84% : -0.1422%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@W4_regNumber_enum@@IW4RefType@@PEAUGenTree@@_K@Z : -374957296 : -100.00% : 1.30% : -0.2221%
?buildKillPositionsForNode@LinearScan@@AEAA_NPEAUGenTree@@I_K@Z : -390519208 : -100.00% : 1.36% : -0.2313%
?freeRegisters@LinearScan@@AEAAX_K@Z : -598599174 : -100.00% : 2.08% : -0.3545%
?BuildDef@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@_KH@Z : -712437498 : -100.00% : 2.48% : -0.4219%
?BuildUse@LinearScan@@AEAAPEAVRefPosition@@PEAUGenTree@@_KH@Z : -927590191 : -100.00% : 3.23% : -0.5493%
??$allocateRegisters@$0A@@LinearScan@@QEAAXXZ : -1375837725 : -19.43% : 4.79% : -0.8148%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@_KI@Z : -1693247150 : -100.00% : 5.89% : -1.0028%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z : -6078036133 : -100.00% : 21.15% : -3.5995%
After fixing it, the regression drops to 0.5%. The remaining regression is just scattered around because of various factors and is not related to any specific pattern.
Yeah, seems to just be various MSVC regressions due to now having a struct instead of primitive type. Clang seems to do a little bit better. Not much we can do about that I think.
We might consider switching regMaskTP to a struct everywhere instead of having it as a primitive, to have it unified everywhere. I would personally prefer it, even if it causes MSVC to pessimize codegen slightly. Thoughts @dotnet/jit-contrib?
Alternatively we could move extra operations to live on some RegMaskOps type that operates on regMaskTP. That would probably allow all the client code to stay unified as well, even with regMaskTP typedeffed to a primitive.
We might consider switching regMaskTP to a struct everywhere instead of having it as a primitive, to have it unified everywhere. I would personally prefer it, even if it causes MSVC to pessimize codegen slightly. Thoughts @dotnet/jit-contrib?
I don't mind doing it and was advocate of similar idea back in https://github.com/dotnet/runtime/pull/98258 because with that, in future, when we add APX support for Intel, enabling the "handling of 64 registers".
Edit: With that said, given that https://github.com/dotnet/runtime/pull/98258 is already out for couple of months now and it is a critical work that is needed to make other progress for SVE, I would like to concentrate on enabling it with as minimal work as needed (just for arm64) and have it enable for other platforms as a follow up PRs once we complete the "predicate register" work.
/azp run runtime-coreclr superpmi-diffs
Azure Pipelines successfully started running 1 pipeline(s).
/azp run runtime-coreclr superpmi-replay
Azure Pipelines successfully started running 1 pipeline(s).