sway Refactor IR memory management and pointers.

As of #2363 we have explicit reference semantics for mutable args including copy-type args.

Problem

This is a bit of a stop-gap measure though as memory management in the IR is inconsistent and sometimes ambiguous.

In particular:

There are no explicit memory manipulation instructions like mem_cpy or mem_set.
Pointers are only for locals and now some args.
Mutability is neither explicit nor enforced.
There's a lack of distinction between (Sway's) copy types and reference types.
Arrays and structs are both aggregates but are actually handled orthogonally.

There's probably other things I've forgotten right now.

Proposal

I think to address some of these issues we should change the following:

Use store and load only for integer 64-bit types and with explicit pointer src/dst.
Introduce memory instructions for copying larger values.
No such thing as Sway's implicit copy or ref types, instead pointers must be explicitly used when necessary.
All pointers are mutable. It's not up to the IR to worry about mutability, the front-end compiler can enforce those restrictions. No mut keyword needed.
Split arrays and structs. Abandon 'aggregates' as they're implemented now.
Use get_ptr as LLVM's GEP, update where needed in the land of explicit pointers. Use int_to_ptr and ptr_to_int instead of casting in get_ptr.
Introduce (or at least consider) 'opaque' pointers which are untyped. This is the approach LLVM has switched to.

I guess not all of these should be done in a single PR, but much of it will need to be done together to remain workable without intermediate hacks. This is another of those not-urgent-but-should-be-done-sooner-rather-than-later-to-avoid-pain.

Aug 03 '22 07:08 otrho

Introduce (or at least consider) 'opaque' pointers which are untyped. This is the approach LLVM has switched to.

When I saw the keyword ptr in IR when I first started contributing in January, I thought you designed pointers to be opaque just like LLVM's future direction. The challenge with opaque pointers is losing some information related to size of the type so that it's harder to compute offsets (e.g. opaque pointer arguments etc.). LLVM was trying to fix that with some metadata and some other ways of carrying type information IIRC, but I haven't been keeping track of its recent progress. On the flip side, analyzing IR with opaque pointers is much easier and enables more optimization opportunities. I think we should seriously consider this option and it's clearly the better option long term (WWLLVMD?)

I also agree with the rest of the proposal. Thanks!

Aug 04 '22 00:08 mohammadfawaz

I haven't dug super deep into what LLVM has needed with opaque pointers, but I was (am?) making the assumption that it resembles their attitude with signedness in arithmetic. I.e., the value itself isn't signed but rather the operation. Same with pointers -- they just represent an address, but what you do with it is up to the operator. Hopefully in a statically typed language we wouldn't come across any ambiguities but there are always edge cases...

Aug 04 '22 01:08 otrho

Same with pointers -- they just represent an address

Yeah exactly. LLVM analysis passes also tend to be easier to write without having to worry about needless pointer casts which would potentially improve the quality of the existing passes and enable more optimization opportunities.

Also looks like support for them is complete in both Clang and LLVM! So if we have any questions, we can always consult LLVM.

Aug 04 '22 13:08 mohammadfawaz

It looks to me like most of the issues listed here were solved when #2819 was fixed with #4336.

Introduce (or at least consider) 'opaque' pointers which are untyped. This is the approach LLVM has switched to.

Except this. I don't see that as a priority at the moment ad we should close this issue. @IGI-111 .

May 31 '23 09:05 vaivaswatha