enso
enso copied to clipboard
Remove IdMap from source file.
The id-map grows with the code rapidly, and even reducing its size won't help much; still, the id-map tends to take 100x more space than the code it describes. But perhaps we don't need to store them in file at all.
Main idea
Instead of storing id-map in source file, we could exchange it using Language Server API.
The id-map purpose is to make an identification of particular AST nodes, what in turn is needed to:
- Assign node metadata (like position, color, etc.) and - in the future - widget metadata (like which widget was directly picked by the user).
- #10182 identify expressions in messages between GUI and Engine (where visualization is attached, what is the type of expression, etc.) Because of execution contexts and visualizations, these identifications should be persistent.
The second point do not require AST IDs persistence, while the first point may be handled by storing direct spans instead of IDs in the source file.
Further ideas and optimizations
- To avoid initial exchange and reduce id-map size, we could assume some deterministic way of assigning IDs to nodes without id-map entry, for example: for every node with assigned ID from map, we assign increments of this ID to its children in DFS order (of course, if we meet another node with already assigned ID, we skip the entire subtree).
- or, we could resign from exchanging IDs, and just add list of span pairs (A, B) meaning, that old code's span A's identity should be passed to new code's span B. Both parties then should manage updating their information about visualizations, execution stacks etc.
### Gui Tasks
- [ ] Add a draft title or issue reference here
### Engine Tasks
- [ ] https://github.com/enso-org/enso/issues/10182
The Truffle instrumentation is built around SourceSectionFilter, not UUIDs. The filter uses ranges (offset, line, column) and tags. It would be great if we adhered more closely to the capabilities offered by Truffle when eliminating the IdMap usage.
- Let's assume the following is implemented: #7954
Then the AST is hosted in the same process as the engine and its instrumentation. Assuming y.js is efficient in synchronizing AST modifications, we don't need UUIDs to keep element identities. The AST elements themselves represent identity. Visualizations shall become part of the AST - attaching/removing them modifies some attribute of clearly identified AST element. When there is an AST change that modifies the source code we need to traverse the AST and recompute ranges (offset, line, column) of elements that require instrumentation. A change in the source code always comes with new ranges. All of that is done inside of a single process without any need for network communication via a special protocol.
Opportunities with visualizations: Y.js system offers concept of subdocuments - lazily loaded entities used for on demand synchronization. We could use them for distributing visualizations across the peers. IDEs would annotate an AST element as eligible for obtaining visualization. Engine would compute the visualization and attach it as a subdocument. IDEs interested in that visualization would load it and observe its changes.
Longer term future vision: with the AST being in the same process as Truffle we should make a connection between the AST elements and Truffle nodes and avoid reparsing the whole .enso
file when only a subtree is changed.
I have rephrased the strategic vision in a separate document. An essential part of it still remains the sharing of ASTs among peers via y.js server.
#10182 will add an optional IdMap parameter in the text/applyEdit
request.
I propose to simplify the serialized IdMap elements format from the current
[[{"index":{"value":0},"size":{"value":4}},"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]
to an array of 3 elements: index, size and uuid
[[0,4,"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]
#10182 will add an optional IdMap parameter in the
text/applyEdit
request.I propose to simplify the serialized IdMap elements format from the current
[[{"index":{"value":0},"size":{"value":4}},"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]
to an array of 3 elements: index, size and uuid
[[0,4,"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]
I'm ok, but perhaps we should support reading both formats for some time.
I propose to simplify the serialized IdMap elements format from the current to an array of 3 elements: index, size and uuid
[[0,4,"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]
- the format change has been tracked as #7989
I'm ok, but perhaps we should support reading both formats for some time.
Yes, being able to read previous node positions in new IDE is essential.
I'm not going to change the parser. I was talking about the JSON-RPC text/applyEdit
parameter
#10283 adds an optional idMap
parameter to text/applyEdits
that overrides/complements the IdMap in the file
interface TextApplyEditParameters {
/** The file edit. */
edit: FileEdit;
/**
* A flag indicating whether we should re-execute the program after applying
* the edit. Default value is `true`, indicating the program should be
* re-executed.
*/
execute?: boolean;
/**
* An identifiers map associated with this file as an array of
* index, length, uuid triples. The old id map format that was used in the
* source file is also supported.
*/
idMap?: [number, number, UUID][];
}
@farmaazon the next step could be to keep in the file only the IDs that are used in the second metadata line (with node positions, etc.) and send the rest of the IDs with the text/applyEdit
request. This should reduce the size of the IdMap section in the file metadata. If you're ok with the idea, I'll implement it as a followup PR.
@farmaazon the next step could be to keep in the file only the IDs that are used in the second metadata line (with node positions, etc.) and send the rest of the IDs with the
text/applyEdit
request. This should reduce the size of the IdMap section in the file metadata. If you're ok with the idea, I'll implement it as a followup PR.
Looks good, I'm ready to help with ydoc-server code if needed.
Dmitry Bushev reports a new STANDUP for today (2024-06-24):
Progress: Started working on storing the reduced IdMap in the file. Implemented the IdMap diff. Implemented sending IdMap in the applyEdit request. It should be finished by 2024-06-29.
Next Day: Next day I will be working on the #9257 task. Continue working on the task
Dmitry Bushev reports a new STANDUP for today (2024-06-25):
Progress: Playing with different usage cases. Debugged and fixed the issue with the metadata recovery. Fixed the issue with the broken IdMap between the project restarts. Undrafted the PR It should be finished by 2024-06-29.
Next Day: Next day I will be working on the #9257 task. Continue working on the task