Motoko Data Inspection

Open luc-blaeser opened this issue 5 months ago • 3 comments

Motoko Data Inspection

Note: This is still a prototype, not yet ready for merging.

Generic data inspection of Motoko canisters by authorized users.

Your Motoko Program is a Database!

This is only a small first step towards the bigger vision of providing data tooling to Motoko, similar to a database management system. This would support data inspection, data queries, maybe even data modification, data backup/restore, complex data migration, and/or administrative functionality.

Frontend Prototype

A simple frontend is available to test the data inspection, see https://github.com/luc-blaeser/data-inspector (limited access).

Backend Design

The Motoko runtime system is extended to stream the heap state to the frontend canister for displaying the data to authorized users.

Currently, the following design aspects apply to the data inspection in the Motoko runtime system:

Only a controller can inspect the heap state over this functionality.
The inspection is based on enhanced orthogonal persistence to profit from the precise object tagging.
The data inspection returns the set of live objects (stable and flexible) reachable from the main actor.
The format is mostly a one-to-one binary map of the heap object payload to minimize processing in the backend.
For simplicity, currently, only full-heap inspection is supported. Larger scalability can be implemented later, see thoughts in .
Currently, the field names are not yet shown. This can be supported later.

Binary format (EBNF)

Format = Version Root Heap.
Version = `1: usize`.
Root = `object_id: usize`.
Heap = `object_id: usize` `object_tag: usize` `object_payload:

The object payload is organized as follows:

The regular RTS object payload, with pointers replaced by object ids.
The payload is always a multiple of the word size.
For Object, the object size is prepended because the hash blob cannot directly be located in the stream.

object_id are potentially synthetic identifiers of an objects. The ids are skewed, to distinguish them from scalars. Currently, the object_id are heap pointers but this would change with incremental inspection.

usize is 64-bit little endian.

Implementation

Currently, a separate mark bitmap is used for heap traversal during inspection. This bitmap is independent of the potentially other bitmaps used during incremental GC.
For arrays, the tag can not be copied one-to-one from the heap object as it may temporarily store slicing information stored during the incremental GC.
As usual, forwarding pointers of the incremental GC need to be resolved during heap inspection.
A separate mark stack is needed during heap inspection. This stack additionally stores the array slicing information of the heap inspection, independent of the incremental GC.
A simple stream buffer is used to serialize the binary result of the heap inspection. The buffer is represented as a linked list of blobs that is finally copied to a combined single blob. This is because the size of the live set is not known in advance.

Future: Incremental Inspection

Incremental inspection can be realized in the future for scalability to larger heaps:

It enables chunked data downloads in multiple messages without blocking other user messages. This is particularly important because the message response size is limited.
It establishes a logical session where the client receives incremental heap changes without needing to refetch the full heap.

Possible implementation:

Synthetic object ids need to be used that are independent of the address of the object. This is because objects are moved by the GC.
A hash map can be used to map heap pointers to synthetic object ids. This map also serves for marking during heap traversal, such that a mark bitmap would no longer be needed. The pointers in the map are weak pointers that are updated by the GCs but will be removed from the map if the object is collected.
On chunked data downloads, the object state can only be sent if all their contained pointers have been traversed. Otherwise, their state need to be transmitted on a subsequent download message.
A pending list record the objects which state is ready to be sent in a next download. The pointers in the pending list are weak.
Write barriers need to be extended to catch all mutator writes to pointers and scalars during a heap inspection session. The pointers of modified objects are recorded in a hash set, similar to the remembered set of the generational GCs. Again the pointers are treated as weak pointers.
On incremental inspection, the runtime system resends the state of modified objects of the hash set in addition to a potentially next heap chunk of the pending list. The hash set is eventually cleared and the sent objects are removed from the pending list.
On the client side, the object graph is updated for each resent object, while new objects are added.

Sep 20 '24 12:09 luc-blaeser

motoko motoko copied to clipboard