Note: This requires adjustments of the IC runtime and execution layers - the PR is thus not yet ready to be merged.

PR Stack

Enhanced orthogonal persistence support is structured in four PRs to ease review:

https://github.com/dfinity/motoko/pull/4488
https://github.com/dfinity/motoko/pull/4475
https://github.com/dfinity/motoko/pull/4225 <-- this PR
https://github.com/dfinity/motoko/pull/4193

Enhanced Orthogonal Persistence (64-Bit without Graph Copy)

This implements the vision of enhanced orthogonal persistence in Motoko that combines:

Stable heap: Persisting the program main memory across canister upgrades.
64-bit heap: Extending the main memory to 64-bit for large-scaled persistence.

As a result, the use of secondary storage (explicit stable memory, dedicated stable data structures, DB-like storage abstractions) will no longer be necessary: Motoko developers can directly work on their normal object-oriented program structures that are automatically persisted and retained across program version changes.

Advantages

Compared to the existing orthogonal persistence in Motoko, this design offers:

Performance: New program versions directly resume from the existing main memory and have access to the memory-compatible data.
Scalability: The upgrade mechanism scales with larger heaps and in contrast to serialization, does not hit IC instruction limits.

Compared to the explicit use of stable memory, this design improves:

Simplicity: Developers do not need to deal with explicit stable memory.
Performance: No copying to and from the separate stable memory is necessary.

Design

The enhanced orthogonal persistence is based on the following main properties:

Extension of the IC to retain main memory on upgrades.
Supporting 64-bit main memory on the IC.
A long-term memory layout that is invariant to new compiled program versions.
A fast memory compatibility check performed on each canister upgrade.
Incremental garbage collection using a partitioned heap.

IC Extension

The necessary IC extensions are implemented in a separate PR: https://github.com/dfinity/ic/pull/139. This PR is based on these extensions.

Memory Layout

In a co-design between the compiler and the runtime system, the main memory is arranged in the following structure, invariant of the compiled program version:

Lower 4MB: Rust call stack.
Space between 4MB and 4.5MB: Limited reserved space Wasm data segments, only used for the Motoko runtime system.
Between 4.5MB and 5MB: Persistent metadata.
Thereafter: Dynamic heap space. Fix start address at 5MB.

Persistent Metadata

The persistent metadata describes all anchor information for the program to resume after an upgrade.

More specifically, it comprises:

A stable heap version that allows evolving the persistent memory layout in the future.
The stable subset of the main actor, containing all stable variables declared in the main actor.
A descriptor of the stable static types to check memory compatibility on upgrades.
The runtime state of the garbage collector, including the dynamic heap metadata and memory statistics.
A reserve for future metadata extensions.

Compatibility Check

Upgrades are only permitted if the new program version is compatible with the old version, such that the runtime system guarantees a compatible memory structure.

Compatible changes for immutable types are largely analogous to the allowed Motoko subtype relation, e.g.

Adding or removing actor fields.
Removing object fields.
Adding variant fields.
Nat to Int.
Shared function parameter contravariance and return type covariance.

The existing IDL-subtype functionality is reused with some adjustments to check memory compatibility: The compiler generates the type descriptor, a type table, that is recorded in the persistent metadata. Upon an upgrade, the new type descriptor is compared against the existing type descriptor, and the upgrade only succeeds for compatible changes.

This compatibility check serves as an additional safety measure on top of the DFX Candid subtype check that can be bypassed by users (when ignoring a warning). Moreover, in some aspects, the memory compatibility rules differ to the Candid sub-type check:

Top-level actor fields (stable fields) can change mutability (let to var and vice-versa).
Support of variable (MutBox) with type invariance.
Types cannot be made optional (no insertion of Option).
Same arity for function parameters and function return types (no removed optional parameters, no additional optional results).
Records cannot introduce additional optional fields.
Same arity for tuple types (no insertion of optional items).
Records and tuples are distinct.

Garbage Collection

The implementation focuses on the incremental GC and abandons the other GCs because the GCs use different memory layouts. For example, the incremental GC uses a partitioned heap with objects carrying a forwarding pointer.

The incremental GC is chosen because it is designed to scale on large heaps and the stable heap design also aims to increase scalability.

The garbage collection state needs to be persisted and retained across upgrades. This is because the GC may not yet be completed at the time of an upgrade, such that object forwarding is still in use. The heap partition structure is described by a linked list of partition tables that is reachable from the GC state.

The garbage collector uses two kinds of roots:

Persistent roots: These refer to root objects that need to survive canister upgrades.
Transient roots: These cover additional roots that are only valid in a specific version of a program and are discarded on an upgrade.

The persistent roots are registered in the persistent metadata and comprise:

All stable variables of the main actor, only stored during an upgrade.
The stable type table.

The transient roots are referenced by the Wasm data segments and comprise:

All canister variables of the current version, including flexible variables.

Main Actor

On an upgrade, the main actor is recreated and existing stable variables are recovered from the persistent root. The remaining actor variables, the flexible fields as well as new stable variables, are (re)initialized.

As a result, the GC can collect unreachable flexible objects of previous canister versions. Unused stable variables of former versions can also be reclaimed by the GC.

No Static Heap

The static heap is abandoned and former static objects need to be allocated in the dynamic heap. This is because these objects may also need to survive upgrades and the persistent main memory cannot accommodate a growing static heap of a new program version in front of the existing dynamic heap. The incremental GC also operates on these objects, meaning that forwarding pointer resolution is also necessary for these objects.

For memory and runtime efficiency, object pooling is implemented for compile-time-known constant objects (with side-effect-free initialization), i.e. those objects are already created on program initialization/upgrade in the dynamic heap and thereafter the reference to the corresponding prefabricated object is looked up whenever the constant value is needed at runtime.

The runtime system avoids any global Wasm variables for state that needs to be preserved on upgrades. Instead, such global runtime state is stored in the persistent metadata.

Wasm Data Segments

Only passive Wasm data segments are used by the Motoko compiler and runtime system. In contrast to ordinary active data segments, passive segments can be explicitly loaded to a dynamic address.

This simplifies two aspects:

The generated Motoko code can contain arbitrarily large data segments (to the maximum that is supported by the IC). The segments can be loaded to the dynamic heap when needed.
The IC can simply retain the main memory on an upgrade without needing to patch any active data segments of the new program version to the persistent main memory.

However, more specific handling is required for the Rust-implemented runtime system (RTS): The Rust-generated active data segment of the runtime system is changed to the passive mode and loaded to the expected static address on the program start (canister initialization and upgrade). The location and size of the RTS data segments is therefore limited to a defined reserve of 512 KB, see above. This is acceptable because the RTS only requires a controlled small amount of memory for its data segments, independent of the compiled Motoko program.

Null Sentinel

As an optimization, the top-level null pointer is represented as a constant sentinel value pointing to the last unallocated Wasm page. This allows fast null tests without involving forwarding pointer resolution of potential non-null comparand pointers.

Memory Capacity

The canister has no upfront knowledge of the maximum allocatable Wasm main memory in 64-bit address space, as there is no IC API call to query the main memory limit. This limit may also be increased in future IC releases.

Therefore, a mechanism is implemented to deal with an unknown and dynamically increasable main memory capacity offered by the IC. This is needed in two cases:

GC reserve (strict): The runtime system ensures sufficient free space to allow garbage collection at all times, even if the heap is full. For this purpose, the runtime system already pre-allocates the reserve, to be sure that the reserve is available despite the unknown capacity. As an optimization, this pre-allocation is skipped when the memory demand including the GC reserve is below a guaranteed minimum Wasm memory limit of the IC, e.g. 4GB or 6GB.
GC scheduling (heuristic): The GC schedules at high frequency when memory is becoming scarce. For this purpose, the GC maintains an assumption of the minimum memory limit and probes the supposed limit when the heap size approaches this limit. If the allocation succeeds, the assumed limit is increased. Otherwise, the critical high-frequency GC scheduling rate is activated.

In both cases, the runtime system tries to reduce Wasm memory allocations as much as possible, i.e. not pre-allocating memory for small heap sizes, and not probing an allocation in certain memory ranges by assuming that the IC only offers main memory of a certain granularity, e.g. multiples of 2GB. To save instructions, the critical GC scheduling is only activated when reaching the actual memory limit. Moreover, the mechanism can handle an increased memory capacity at runtime, e.g. when the IC is upgraded to a new release with a higher memory limit.

Migration Path

When migrating from the old serialization-based stabilization to the new persistent heap, the old data is deserialized one last time from stable memory and then placed in the new persistent heap layout. Once operating on the persistent heap, the system should prevent downgrade attempts to the old serialization-based persistence.

Assuming that the persistent memory layout needs to be changed in the future, the runtime system supports serialization and deserialization to and from stable memory in a defined data format. This migration path is similar to the current upgrade mechanism, however with a much more scalable design: A graph copy algorithm serializes/deserializes the heap structure and, differently to Candid, avoids object duplications. Arbitrarily large data can be serialized and deserialized beyond the instruction and working set limit of upgrades: Large data serialization and deserialization is split in multiple messages, running before and/or after the IC upgrade to migrate large heaps. Of course, other messages will be blocked during this process and only the canister owner or the canister controllers are permitted to initiate this process. This migration would only occur in rare cases. Using a defined long-term format allows flexible migration compatibility across all compiler versions: Each Motoko compiler version only needs to implement the serialization/deserialization for their specific persistent heap layout. This migration path gives us the option to make radical changes in the future, e.g. to introduce a new GC or rearrange the persistent metadata. Precise tagging thereby enriches the runtime metadata of values for potential advanced migrations, such as specializing arrays for small scalar types, or changing the value representation.

Old Stable Memory

The old stable memory remains equally accessible as secondary (legacy) memory with the new support.

Current Limitations

Freeing old object fields: While new program versions can drop object fields, the runtime system should also delete the redundant fields of persistent objects of previous program versions. This could be realized during garbage collection when objects are copied. For this purpose, the runtime system may maintain a set of field hashes in use and consult this table during garbage collection. Another, probably too restrictive solution could be to disallow field removal (subtyping) on object upgrades during the memory compatibility check.
The floating point display format differs in Wasm64 for special values, e.g. nan becomes NaN. There is currently no support for hexadecimal floating point text formatting.
Workaround for Rust needed to build PIC (position-independent code) libraries. Explicit use of emscripten via LLVM IR.
ic-wasm would need to be extended to Wasm64. The Wasm optimizations in test/bench are thus currently deactivated.
The Wasm profiler (only used for the flamegraphs) is no longer applicable because the underlying parity-wasm crate is deprecated before Wasm Memory64 support. It also lacks full support of passive data segments. A re-implementation of the profiler would be needed.

Related PRs

IC support for enhanced orthogonal persistence: https://github.com/dfinity/ic/pull/143
IC deterministic working set limit for 64-bit main memory: https://github.com/luc-blaeser/ic/pull/1
Motoko base library 64-bit support: https://github.com/dfinity/motoko-base/pull/589

Underlying partial implementations:

32-bit enhanced orthogonal persistence: https://github.com/dfinity/motoko/pull/4193
64-bit support for Motoko (transient heap): https://github.com/dfinity/motoko/pull/4136

Sep 26 '23 17:09 luc-blaeser

Ambitious plans, I like them!

I miss a discussion of callbacks to outstanding calls. Will they simply be dropped? It seems “stable closures” are still out of reach? Or is the assumption that you still stop the canister before upgrading (so not instantaneous upgrades in the sense of https://www.joachim-breitner.de/blog/789-Zero-downtime_upgrades_of_Internet_Computer_canisters)?

Sep 27 '23 10:09 nomeata

outstanding

Thank you very much for your feedback. This is actually still stopping the canister, not upgrading between outstanding callbacks. Probably "instantaneous" is not fitting well, it is rather "constant-time" upgrade w.r.t. to the memory - without serialization to stable memory. However, I am also in favor of introducing stable closures and stable local functions under certain requirements, e.g. that the function is explicitly named in a stable type. This would not only be nice for non-stop upgrades, but also for allowing more stable types. Ideally, we could make "stable" the default, while "flexible" is only an explicit exemption. With this, the design would get closer to previous orthogonal persistent programming languages (e.g. Persistent Oberon), where references are always persistent by default, and can be weakened only if needed.

Sep 27 '23 11:09 luc-blaeser

Comparing from 60296040a429cfa927de0fcf7e706c8b3ec67ad6 to a9184eb37d8f028dc3f137028d7bbf2bb3e43a4c: In terms of gas, 5 tests improved and the mean change is -29.8%. In terms of size, 5 tests improved and the mean change is -10.9%.

Sep 27 '23 12:09 github-actions[bot]

The issue isn't just "outstanding callbacks". It's closures (and objects) in general, isn't it? What happens to a closure in the heap when the respective function disappears with a code upgrade? And even if it remains, how do you know the relation between old and new function index?

Sep 27 '23 13:09 rossberg

The issue isn't just "outstanding callbacks". It's closures (and objects) in general, isn't it? What happens to a closure in the heap when the respective function disappears with a code upgrade? And even if it remains, how do you know the relation between old and new function index?

Thank you very much for pointing this out. It is probably not solvable in all cases, but maybe in some common cases of explicitly named local functions and object/class functions etc. One approach could be to keep a mapping between the fully qualified function identifier (including scopes, such as classes) and the function index. On an upgrade, the function of a stable closure would need to be rebound to the new function index of the identically named function in the new program version. The memory compatibility check of the persistence mechanisms would then need to additionally check that the same named function still exists in the new program version and that the type matches (otherwise disallowing upgrade). Apart from the function index, the corresponding object state of the closure could be persisted like other objects. To be honest I haven't yet thought through all cases. Maybe I miss some important aspect? Of course, the outlined approach would be based on the programming convention, that we assume a semantic correspondence between functions with identical identifiers across different program versions.

Sep 27 '23 14:09 luc-blaeser

Well, we did think through some of this a few years back (there might still be traces of discussions on the issue tracker). Unfortunately, it is much more complicated, because you also need to make sure that the closure environment stays the same. That is, you'll need to record what local variables are captured by every function and with what types, and they all must remain the same. But that is extremely constraining, it would disallow almost any interesting changes to code around local functions and lock users into more or less the initial version of their source code.

At the time we more or less concluded that this problem isn't solvable in a practical fashion. At the very least, it would require an explicit opt-in from the user, to decide which functions and local variables they are willing to never touch again, as the price for stability. But if the long-term plan is to make all memory persistent by default, then they wouldn't have much of a choice.

It also becomes a problem if somebody uses a library, and that library is modified. They'd be stuck forever with the old version. The overarching problem of truly persistent memory is that it creates lock-in on all levels.

Sep 27 '23 15:09 rossberg

Thank you very much for the reply and also raising this valid concern. I remember to have seen the related discussion https://github.com/dfinity/motoko/issues/707.

Without good data evolution support, the persistent types (including the potentially new stable closures) can only be changed in a very limited way. Although, data can still be accessed by providing copying to a new type, which is certainly very inconvenient (at least the data is not lost with a lock-in). Other orthogonally persistent programming languages provide a data evolution tool similar to DBMS systems (e.g. in Persistent Oberon, Persistent Java/PJama etc.). Maybe we need something similar?

I believe stable closures are probably only useful for and should be limited to rather "static" functions, such as module functions, actor (local) functions, or in a class that is not extended. This could probably permit the use of common data structures in persistence, such as with an object-oriented design, or data structures that need some equality/hashing/comparison functions as arguments. Of course, this implies that such class-based data structures are designed long-term and cannot be extended without providing versioning or some custom upgrade/copy-over logic.

But maybe I am too idealistic or optimistic.

Sep 27 '23 18:09 luc-blaeser

Since the present PR is not about upgrading a non-stopped canister, let's maybe not further detail it here. I'm worried we won't find useful insight here later if we need them :-)

Sep 27 '23 18:09 nomeata

This PR is orthogonal to the question of whether we allow stable functions, right? We are still requiring data stored in stable variables to be stable. Before and after upgrade, that stable data may dynamically still contain closures (due to subtyping hiding them), but they should never be reachable, or callable, from user code. Unless, heaven forbid, we ever add down casts.

For stable functions, I think it would be enough to introduce a new type of named, closed function (i.e. function pointer) and only consider those stable - much as we allow function references to be passed in Candid. They aren't actually closures, but abstract named entry points only, sans environments. Upgrade would really be a uniform change of function body only. It's mildly annoying, but I think class based languages like C# and Java lend themselves much better to this sort of upgrade, because the classes are named and all methods are closed, taking the object instance as an explicit argument. But maybe I'm missing something there too.

We still need to ditch candidish as the stable format for heap representation changes, but that's a separate issue.

Sep 28 '23 23:09 crusso

Looks good - identified some minor issues. 2 more PRs to go!

Thank you really very much for your enormous review effort!

May 16 '24 09:05 luc-blaeser

Additional commits look ok to me. Nice peephole optimizations (lets hope they are correct...)

Jun 07 '24 10:06 crusso

Incredible effort @luc-blaeser and @crusso! Very, very exciting 🙂

Jun 25 '24 02:06 ByronBecker

Incredible effort @luc-blaeser and @crusso! Very, very exciting 🙂

Thank you very much, Byron!

Jun 26 '24 12:06 luc-blaeser

motoko motoko copied to clipboard