wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

[meta] Migrate instruction selection to ISLE

Open fitzgen opened this issue 3 years ago • 29 comments

This is a meta issue to track the migration from hand-written instruction selection and lowering over to using the ISLE DSL.

As you port lowering for a clif opcode over to ISLE, please check the associated box (or leave a comment, if you don't have edit permissions and I or someone else can check the box for you). Hopefully this will help us focus our porting efforts and finish the migration in a timely manner, as well as avoid stepping on each others toes by having two people accidentally port the same opcode lowerings.

cc @alexcrichton @cfallin @abrown @jlb6740 @uweigand @sparker-arm @akirilov-arm

x86_64 -- DONE!

  • [x] Opcode::Clz
  • [x] Opcode::Ctz
  • [x] Opcode::Popcnt
  • [x] Opcode::Bitrev
  • [x] Opcode::IsNull
  • [x] Opcode::IsInvalid
  • [x] Opcode::Uextend
  • [x] Opcode::Sextend
  • [x] Opcode::Breduce
  • [x] Opcode::Bextend
  • [x] Opcode::Ireduce
  • [x] Opcode::Bint
  • [x] Opcode::Icmp
  • [x] Opcode::Fcmp
  • [x] Opcode::FallthroughReturn
  • [x] Opcode::Return
  • [x] Opcode::Call
  • [x] Opcode::CallIndirect
  • [x] Opcode::Debugtrap
  • [x] Opcode::Trapif
  • [x] Opcode::Trapff
  • [x] Opcode::WideningPairwiseDotProductS
  • [x] Opcode::Fadd
  • [x] Opcode::Fsub
  • [x] Opcode::Fmul
  • [x] Opcode::Fdiv
  • [x] Opcode::Fmin
  • [x] Opcode::Fmax
  • [x] Opcode::FminPseudo
  • [x] Opcode::FmaxPseudo
  • [x] Opcode::Sqrt
  • [x] Opcode::Fpromote
  • [x] Opcode::FvpromoteLow
  • [x] Opcode::Fdemote
  • [x] Opcode::Fvdemote
  • [x] Opcode::FcvtFromSint
  • [x] Opcode::FcvtLowFromSint
  • [x] Opcode::FcvtFromUint
  • [x] Opcode::FcvtToUint
  • [x] Opcode::FcvtToUintSat
  • [x] Opcode::FcvtToSint
  • [x] Opcode::FcvtToSintSat
  • [x] Opcode::IaddPairwise
  • [x] Opcode::UwidenHigh
  • [x] Opcode::UwidenLow
  • [x] Opcode::SwidenHigh
  • [x] Opcode::SwidenLow
  • [x] Opcode::Snarrow
  • [x] Opcode::Unarrow
  • [x] Opcode::Bitcast
  • [x] Opcode::Fabs
  • [x] Opcode::Fneg
  • [x] Opcode::Fcopysign
  • [x] Opcode::Ceil
  • [x] Opcode::Floor
  • [x] Opcode::Nearest
  • [x] Opcode::Trunc
  • [x] Opcode::Load
  • [x] Opcode::Uload8
  • [x] Opcode::Sload8
  • [x] Opcode::Uload16
  • [x] Opcode::Sload16
  • [x] Opcode::Uload32
  • [x] Opcode::Sload32
  • [x] Opcode::Sload8x8
  • [x] Opcode::Uload8x8
  • [x] Opcode::Sload16x4
  • [x] Opcode::Uload16x4
  • [x] Opcode::Sload32x2
  • [x] Opcode::Uload32x2
  • [x] Opcode::Store
  • [x] Opcode::Istore8
  • [x] Opcode::Istore16
  • [x] Opcode::Istore32
  • [x] Opcode::AtomicRmw
  • [x] Opcode::AtomicCas
  • [x] Opcode::AtomicLoad
  • [x] Opcode::AtomicStore
  • [x] Opcode::Fence
  • [x] Opcode::FuncAddr
  • [x] Opcode::SymbolValue
  • [x] Opcode::StackAddr
  • [x] Opcode::Select
  • [x] Opcode::Selectif
  • [x] Opcode::SelectifSpectreGuard
  • [x] Opcode::Udiv
  • [x] Opcode::Urem
  • [x] Opcode::Sdiv
  • [x] Opcode::Srem
  • [x] Opcode::Umulhi
  • [x] Opcode::Smulhi
  • [x] Opcode::GetPinnedReg
  • [x] Opcode::SetPinnedReg
  • [x] Opcode::Vconst
  • [x] Opcode::RawBitcast
  • [x] Opcode::Shuffle
  • [x] Opcode::Swizzle
  • [x] Opcode::Insertlane
  • [x] Opcode::Extractlane
  • [x] Opcode::ScalarToVector
  • [x] Opcode::Splat
  • [x] Opcode::VanyTrue
  • [x] Opcode::VallTrue
  • [x] Opcode::VhighBits
  • [x] Opcode::Iconcat
  • [x] Opcode::Isplit
  • [x] Opcode::TlsValue
  • [x] Opcode::SqmulRoundSat
  • [x] Opcode::Uunarrow

aarch64

  • [x] Opcode::Load
  • [x] Opcode::Uload8
  • [x] Opcode::Sload8
  • [x] Opcode::Uload16
  • [x] Opcode::Sload16
  • [x] Opcode::Uload32
  • [x] Opcode::Sload32
  • [x] Opcode::Sload8x8
  • [x] Opcode::Uload8x8
  • [x] Opcode::Sload16x4
  • [x] Opcode::Uload16x4
  • [x] Opcode::Sload32x2
  • [x] Opcode::Uload32x2
  • [x] Opcode::Store
  • [x] Opcode::Istore8
  • [x] Opcode::Istore16
  • [x] Opcode::Istore32
  • [x] Opcode::StackAddr
  • [x] Opcode::AtomicRmw
  • [x] Opcode::AtomicCas
  • [x] Opcode::AtomicLoad
  • [x] Opcode::AtomicStore
  • [x] Opcode::Fence
  • [ ] Opcode::Select
  • [ ] Opcode::Selectif
  • [ ] Opcode::SelectifSpectreGuard
  • [x] Opcode::Bitselect
  • [x] Opcode::Vselect
  • [ ] Opcode::Trueif
  • [ ] Opcode::Trueff
  • [x] Opcode::IsNull
  • [x] Opcode::IsInvalid
  • [x] Opcode::Copy
  • [x] Opcode::Breduce
  • [x] Opcode::Ireduce
  • [x] Opcode::Bextend
  • [x] Opcode::Bmask
  • [x] Opcode::Bint
  • [x] Opcode::Bitcast
  • [x] Opcode::FallthroughReturn
  • [x] Opcode::Return
  • [x] Opcode::Icmp
  • [x] Opcode::Fcmp
  • [x] Opcode::Debugtrap
  • [x] Opcode::Trap
  • [x] Opcode::ResumableTrap
  • [ ] Opcode::Trapif
  • [ ] Opcode::Trapff
  • [x] Opcode::FuncAddr
  • [x] Opcode::SymbolValue
  • [x] Opcode::Call
  • [x] Opcode::CallIndirect
  • [x] Opcode::GetPinnedReg
  • [x] Opcode::SetPinnedReg
  • [x] Opcode::Vconst
  • [x] Opcode::RawBitcast
  • [x] Opcode::Extractlane
  • [x] Opcode::Insertlane
  • [x] Opcode::Splat
  • [x] Opcode::ScalarToVector
  • [x] Opcode::VallTrue
  • [x] Opcode::VanyTrue
  • [x] Opcode::VhighBits
  • [x] Opcode::Shuffle
  • [x] Opcode::Swizzle
  • [x] Opcode::Isplit
  • [x] Opcode::Iconcat
  • [x] Opcode::Imax
  • [x] Opcode::Umax
  • [x] Opcode::Umin
  • [x] Opcode::Imin
  • [x] Opcode::IaddPairwise
  • [x] Opcode::WideningPairwiseDotProductS
  • [x] Opcode::Fadd
  • [x] Opcode::Fsub
  • [x] Opcode::Fmul
  • [x] Opcode::Fdiv
  • [x] Opcode::Fmin
  • [x] Opcode::Fmax
  • [x] Opcode::FminPseudo
  • [x] Opcode::FmaxPseudo
  • [x] Opcode::Sqrt
  • [x] Opcode::Fneg
  • [x] Opcode::Fabs
  • [x] Opcode::Fpromote
  • [x] Opcode::Fdemote
  • [x] Opcode::Ceil
  • [x] Opcode::Floor
  • [x] Opcode::Trunc
  • [x] Opcode::Nearest
  • [x] Opcode::Fma
  • [x] Opcode::Fcopysign
  • [x] Opcode::FcvtToUint
  • [x] Opcode::FcvtToSint
  • [x] Opcode::FcvtFromUint
  • [x] Opcode::FcvtFromSint
  • [x] Opcode::FcvtToUintSat
  • [x] Opcode::FcvtToSintSat
  • [x] Opcode::IaddIfcout
  • [x] Opcode::Iabs
  • [x] Opcode::AvgRound
  • [x] Opcode::Snarrow
  • [x] Opcode::Unarrow
  • [x] Opcode::Uunarrow
  • [x] Opcode::SwidenLow
  • [x] Opcode::SwidenHigh
  • [x] Opcode::UwidenLow
  • [x] Opcode::UwidenHigh
  • [x] Opcode::TlsValue
  • [x] Opcode::SqmulRoundSato
  • [x] Opcode::FcvtLowFromSint
  • [x] Opcode::FvpromoteLow
  • [x] Opcode::Fvdemote
  • [ ] Branches

s390x -- DONE!

  • [x] Calls
  • [x] Returns
  • [x] Traps
  • [x] Branches

fitzgen avatar Jan 21 '22 22:01 fitzgen

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

github-actions[bot] avatar Jan 21 '22 22:01 github-actions[bot]

I'm looking into the remaining issues on s390x. Specifically, I already have patches for branches and traps, which I'll submit shortly. For calls and returns, the issues really are cross-platform - I suspect we'll have to move those to ISLE for all targets at the same time ...

For reference, here's the list of issues I've noticed:

  • For traps, the only problem is that (some) traps need to be emitted as safepoints, and there is currently no common-code mechanism to emit safepoints from ISLE. (I notice that a few traps were moved to ISLE on x64 - but they're simply no longer safepoints now, which looks like a bug to me ...) This is straightforward to fix by having the ISLE code hold on to tuples of (instruction, is_safepoint) instead of just instructions, similar to what is already done elsewhere. I have a patch for that.
  • For branches, there are two problems. The first is a straightforward bug in clif.isle fixed here: https://github.com/bytecodealliance/wasmtime/pull/3718 . The second is more interesting. Currently, branches are emitted via special logic in lower_branch_group, which gets two list of branches and branch targets as input. The list of branches is not a problem; we only ever emit the first branch of that group, and that can be passed to ISLE just like any other instruction. However, when emitting the branch as machine instruction, we need to use the proper branch targets. There is currently no way to access this in ISLE, and I see no obvious way to pass this list into the ISLE machinery either. I have a patch to make this work by having the branch target list instead be generated when needed via an ISLE constructor calling a new context callback current_block_branch_targets. This works for me, but has the drawback that the context now needs to keep new global state identifying the current block being emitted. (But maybe that's not really a problem, given that there is already global state identifying the current instruction anyway?)
  • For both calls and returns, the first issue is that these fundamentally receive variable-argument lists as inputs, and need some way to iterate over these lists in ISLE. These lists are currently represented via a ValueSlice data type, but it turns out this doesn't work at all: any use of a ValueSlice argument in ISLE code will cause the generated Rust code to fail to compile with borrow-checker errors. This is because every single generated constructor and extractor takes a mutable borrow on the context via a &mut self argument - and the ValueSlice type refers to context memory and therefore extends the lifetime of that mutable borrow of any constructor or extractor that returns a variable of that type, which means that while a ValueSlice variable is live, no other constructor or extractor can ever be called, making it quite useless. (This would also be a problem when handling other variable-argument opcodes like load_complex.) I have an experimental patch that uses a pair of a ValueList identifier and an integer offset into the list instead of the ValueSlice, which seems to fix this - not sure if this is the preferred solution. With that patch, I can successfully handle returns.
  • For calls, there are still more fundamental problems. The current code delegates most of the actual instruction emission for calls to the ABICaller trait and associated implementation (which calls back into target code to get ABI details). This code just uses ctx.emit all over the place. Therefore, it is not really usable as part of CLIF processing e.g. via constructors, because CLIF separately buffers instructions, and mixing CLIF emit_insn and context ctx.emit scrambles the sequence (and also confuses the CLIF register mapper because it no longer sees everything). I think the straightforward fix would be to change the ABICaller implementation to emit CLIF instructions instead - but that likely means that all targets have to move calls to ISLE at the same time. (I do not have a patch for this.)

uweigand avatar Jan 24 '22 12:01 uweigand

I'm picking up AtomicRMW for AArch64.

sparker-arm avatar Feb 04 '22 13:02 sparker-arm

I'm going to start working down the x86_64 opcode list from the top -- clz/btz/popcnt/bitrev to start, I think. @abrown let me know what your next plans are and we can make sure not to duplicate work!

cfallin avatar Feb 24 '22 00:02 cfallin

Just for the record, I am working on x64's icmp, fcmp and select instructions.

abrown avatar Mar 10 '22 19:03 abrown

Thanks @abrown. @abrown @cfallin for the record, I'd like to get started with something simple. I'd like to claim fabs and fneg.

jlb6740 avatar Mar 10 '22 19:03 jlb6740

I'm looking at loads and stores next, FYI.

abrown avatar Mar 30 '22 22:03 abrown

I'm picking up AtomicCas and IaddPairwise for aarch64.

sparker-arm avatar Apr 21 '22 14:04 sparker-arm

@abrown are you still planning to take a look at stores on x64? I'm happy to take those first thing Mon if you haven't started yet, as I've got some isel improvements I want to do that involve them (load-op-store patterns). If you're close then no worries though!

cfallin avatar Apr 23 '22 00:04 cfallin

Well, I haven't started stores yet... so take them if you want to!

abrown avatar Apr 23 '22 01:04 abrown

I've also picked up snarrow, unarrow, uunarrow and fvdemote.

sparker-arm avatar May 10 '22 14:05 sparker-arm

I've started on icmp, which doesn't look like it's going to be fun!

sparker-arm avatar Jun 09 '22 10:06 sparker-arm

I've finished up translating icmp for x64

elliottt avatar Jun 13 '22 23:06 elliottt

Transition to ISLE is now complete for s390x. All opcodes are now lowered via ISLE.

uweigand avatar Jun 30 '22 21:06 uweigand

I've made a PR for the aarch64 min/max instructions

sparker-arm avatar Jul 04 '22 10:07 sparker-arm

Last week we also merged bmask/bextend/ireduce/breduce for aarch64

afonso360 avatar Jul 04 '22 11:07 afonso360

I've picked up iabs for aarch64.

dheaton-arm avatar Jul 05 '22 09:07 dheaton-arm

And also picked up swizzle and scalartovector.

Edit: also planning to do fadd down to fma.

dheaton-arm avatar Jul 05 '22 15:07 dheaton-arm

I'm working on finishing the x64 migration to ISLE. Feel free to grab instructions if you'd like to work on them, otherwise I'll continue working down the list.

elliottt avatar Jul 28 '22 20:07 elliottt

@dheaton-arm I'm planning to help push the aarch64 work to completion; I'm hoping to tackle some of the trickier remaining ones (loads and stores, with amode lowering; calls; branches; icmp/fcmp and flags users). Is that OK or have you already started on some of these?

cfallin avatar Aug 25 '22 04:08 cfallin

I've already started on icmp and fcmp, but everything you've mentioned before those should be fine.

dheaton-arm avatar Aug 25 '22 08:08 dheaton-arm

@cfallin I have an upcoming patch that moves the AMode enum definition to ISLE, but I don't plan to touch the actual lowering rules for loads and stores. We already have the amode helper in ISLE, so the latter bit should be doable without waiting for my patch to land (of course, AMode lowering itself would depend on it).

My patch changes quite a lot of code because, as far as I can tell, enum definitions in ISLE always result in enums with named fields, while the existing AMode enum has unnamed fields.

akirilov-arm avatar Aug 25 '22 10:08 akirilov-arm

OK great, I'll take loads/stores themselves, calls, and branches. Lowering of flags-using instructions will intersect with how we do icmp/fcmp so I'll hold off on those. Thanks!

My patch changes quite a lot of code because, as far as I can tell, enum definitions in ISLE always result in enums with named fields, while the existing AMode enum has unnamed fields.

Yep, but there's no fundamental reason for that, it was just a "build it as we need it" sort of thing. We could look at supporting unnamed-field enum variants if it's simpler; it would probably be a 1-2 day refactor.

cfallin avatar Aug 25 '22 17:08 cfallin

Put up PRs for call/ret and loads/stores today; the remaining instructions on aarch64 that don't depend on flags or pattern-matching with icmp/fcmp somehow are:

  • GetPinnedReg/SetPinnedReg
  • ExtractLane/InsertLane
  • StackAddr
  • IaddIfcout
  • TlsValue

I'm happy to do all of these tomorrow, unless someone else objects or has partial work here, then I can do the branch and flags-related ones (trueif/trueff, selectif/selectff/select, brif/brff/brz/brnz) once @dheaton-arm 's icmp/fcmp work is done.

cfallin avatar Aug 26 '22 00:08 cfallin

The x86_64 backend has been migrated to ISLE :tada:

elliottt avatar Aug 26 '22 23:08 elliottt

I'm happy to do all of these tomorrow, unless someone else objects or has partial work here, then I can do the branch and flags-related ones (trueif/trueff, selectif/selectff/select, brif/brff/brz/brnz) once @dheaton-arm 's icmp/fcmp work is done.

@cfallin Just as a heads up, after my icmp patch I've been working on true{if,ff} and select{_,if,if_spectreguard}. I haven't started on the br* ops though. :)

dheaton-arm avatar Aug 31 '22 09:08 dheaton-arm

@dheaton-arm great! I shifted over to some regalloc semantics cleanup work but I can come back and do br* ops next, likely Thu or Fri, unless you want to claim then before then!

cfallin avatar Aug 31 '22 17:08 cfallin

@dheaton-arm great! I shifted over to some regalloc semantics cleanup work but I can come back and do br* ops next, likely Thu or Fri, unless you want to claim then before then!

I'm starting on the branch ops now, assuming you haven't yet. (Starting with jump and br_icmp..brff.)

dheaton-arm avatar Sep 07 '22 12:09 dheaton-arm

@dheaton-arm that sounds great; I didn't get to it last week as the regalloc stuff took longer than expected. Please feel free to grab all the branch ops and let me know if you need any help with any of the infra for that!

cfallin avatar Sep 07 '22 15:09 cfallin

The AArch64 backend has now been fully ported to ISLE.

dheaton-arm avatar Sep 26 '22 09:09 dheaton-arm