abi-aa icon indicating copy to clipboard operation
abi-aa copied to clipboard

Use “thread” rather than “process”

Open rsandifo-arm opened this issue 2 years ago • 8 comments

The AAPCS64 uses “process” as a cover-all term for “thread or process”:

The AAPCS64 applies to a single thread of execution or process (hereafter referred to as a process).

However, SME adds a register called TPIDR2_EL0 that forms an important part of the SME PCS. The “T” in TPIDR2_EL0 stands for “thread”, so it seemed incongruous to use “process” when describing its role in the PCS.

This preliminary patch therefore tries to avoid treating “thread” and “process” as synonymous. Since the exact nature of threads and processes depends on the platform, the patch tries to give a flexible definition.

Two later parts of the PCS refer specifically to processes:

  • “Universal stack constraints”, when describing the parts of stack memory that can be safely accessed. These restrictions apply to the program generally (wrt a given thread's stack and SP), so the patch uses “conforming program” instead of “thread”. This is also the term used in:

    A conforming program must only execute instructions that are in areas of memory designated to contain code.

  • The section on interworking. These restrictions specifically apply at the process level rather than just the thread level, so the patch leaves the wording unchanged.

rsandifo-arm avatar Dec 16 '21 16:12 rsandifo-arm

Whilst there is room for improvement in the wording here, I don't entirely agree with this change. Conceptually, at least (it may be different in environments where memory is flat-mapped with a single page mapping shared by all processes), a process consists of one or more threads of execution. Threads may either execute concurrently (on multiple compute elements in the system) or serially (being switched under the control of an operating system).

With the exception of the stack and thread-local storage the memory is managed and shared across all threads of the process. I don't think it makes sense to say that the heap is managed by a thread - it's managed across the entire process.

Stack and thread-local data are /normally/ private to the thread, but a thread may export the addresses of certain objects to other threads, provided that it can guarantee that the contents of those objects exists and is stable at the times that other threads might access it.

rearnsha avatar Dec 20 '21 17:12 rearnsha

Of course, there is also 'shared memory': memory that is shared by two or more processes. How such sharing is done is, I think, beyond the scope of the AAPCS and into the realms of platform-specific behaviour.

rearnsha avatar Dec 20 '21 17:12 rearnsha

Whilst there is room for improvement in the wording here, I don't entirely agree with this change. Conceptually, at least (it may be different in environments where memory is flat-mapped with a single page mapping shared by all processes), a process consists of one or more threads of execution. Threads may either execute concurrently (on multiple compute elements in the system) or serially (being switched under the control of an operating system).

With the exception of the stack and thread-local storage the memory is managed and shared across all threads of the process. I don't think it makes sense to say that the heap is managed by a thread - it's managed across the entire process.

Stack and thread-local data are /normally/ private to the thread, but a thread may export the addresses of certain objects to other threads, provided that it can guarantee that the contents of those objects exists and is stable at the times that other threads might access it.

Yeah, the new wording was trying to capture that (but clearly failed :-)). The claim wasn't that the thread “owns” these areas of memory in any sense, or that they're private to the thread. It was just that a thread (potentially) has access to these areas of memory. That's why it says:

Each thread must have its own stack, but it can share other categories of memory with other threads in a process. (A platform might allow a thread to access the stacks of other threads, perhaps by treating them as part of the heap.)

The point with the second sentence is that, if thread T1 shares a stack object S with thread T2, S is not part of stack memory from T2's point of view. The distinction is important because of the later requirements about which stack addresses a thread/process is allowed to access. If we said that S was in “stack memory” even from T2's point of view, an access to S would be out of bounds (and so invalid) for T2.

So it feels like we might be saying the same thing, but in different ways.

Like you say, there are a few possible variations, and I don't think it's really up to the PCS to define exactly which memory is shared between threads in a process or between processes in a running system, beyond the requirement that the area of memory designated to be stack memory is specific to each thread. E.g. there's no reason in principle to exclude FDPIC-like models, where code is shared between processes.

rsandifo-arm avatar Dec 20 '21 19:12 rsandifo-arm

Hi @rearnsha. I've (very belatedly) tried to address your concerns in the updated version of the patch. How does this version look?

rsandifo-arm avatar May 13 '22 12:05 rsandifo-arm

I've rebased onto https://github.com/ARM-software/abi-aa/pull/79 and tweaked the wording around the stack a bit more.

rsandifo-arm avatar Jul 15 '22 12:07 rsandifo-arm

@rearnsha, are you now happy with this pull request, after the latest changes of @rsandifo-arm?

stuij avatar Aug 11 '22 13:08 stuij

A few minor changes. Always use might instead of may. It's more easily understood by non-native speakers.

Thanks for the reviews!

I was originally trying to keep the existing wording as much as possible, because I was worried about over-editorialising. But a lot of the suggestions seem uncontroversial, so I've pushed an update with them included.

The “may” vs. “might” thing is trickier though. I think we need to take an action to go through the doc and make a coordinated attempt to remove uses of “may”. It isn't always obvious what the right replacement would be. I get the impression that the doc is being deliberately vague about whether it's describing a state of affairs that holds naturally/inherently, or whether it's prescribing what platforms can and can't do.

For example, I think this:

The address space may consist of one or more disjoint regions

was probably intended to be a concession: platforms are allowed to create disjoint memory regions. Similarly, I think:

No region may span address zero

is supposed to prohibit any attempt to create regions that span address zero. I'm not sure “might” would be a suitable replacement.

rsandifo-arm avatar Sep 30 '22 12:09 rsandifo-arm

"May" is the correct term here. In standards documents like these it always has the specific meaning of "permitted but not required". Using another word or term would just be distracting.

rearnsha avatar Oct 06 '22 13:10 rearnsha