enso icon indicating copy to clipboard operation
enso copied to clipboard

Remove IdMap from source file.

Open farmaazon opened this issue 11 months ago • 10 comments

The id-map grows with the code rapidly, and even reducing its size won't help much; still, the id-map tends to take 100x more space than the code it describes. But perhaps we don't need to store them in file at all.

Main idea

Instead of storing id-map in source file, we could exchange it using Language Server API.

The id-map purpose is to make an identification of particular AST nodes, what in turn is needed to:

  1. Assign node metadata (like position, color, etc.) and - in the future - widget metadata (like which widget was directly picked by the user).
  2. #10182 identify expressions in messages between GUI and Engine (where visualization is attached, what is the type of expression, etc.) Because of execution contexts and visualizations, these identifications should be persistent.

The second point do not require AST IDs persistence, while the first point may be handled by storing direct spans instead of IDs in the source file.

Further ideas and optimizations

  • To avoid initial exchange and reduce id-map size, we could assume some deterministic way of assigning IDs to nodes without id-map entry, for example: for every node with assigned ID from map, we assign increments of this ID to its children in DFS order (of course, if we meet another node with already assigned ID, we skip the entire subtree).
  • or, we could resign from exchanging IDs, and just add list of span pairs (A, B) meaning, that old code's span A's identity should be passed to new code's span B. Both parties then should manage updating their information about visualizations, execution stacks etc.
### Gui Tasks
- [ ] Add a draft title or issue reference here
### Engine Tasks
- [ ] https://github.com/enso-org/enso/issues/10182

farmaazon avatar Mar 04 '24 10:03 farmaazon

The Truffle instrumentation is built around SourceSectionFilter, not UUIDs. The filter uses ranges (offset, line, column) and tags. It would be great if we adhered more closely to the capabilities offered by Truffle when eliminating the IdMap usage.

  • Let's assume the following is implemented: #7954

Then the AST is hosted in the same process as the engine and its instrumentation. Assuming y.js is efficient in synchronizing AST modifications, we don't need UUIDs to keep element identities. The AST elements themselves represent identity. Visualizations shall become part of the AST - attaching/removing them modifies some attribute of clearly identified AST element. When there is an AST change that modifies the source code we need to traverse the AST and recompute ranges (offset, line, column) of elements that require instrumentation. A change in the source code always comes with new ranges. All of that is done inside of a single process without any need for network communication via a special protocol.

Opportunities with visualizations: Y.js system offers concept of subdocuments - lazily loaded entities used for on demand synchronization. We could use them for distributing visualizations across the peers. IDEs would annotate an AST element as eligible for obtaining visualization. Engine would compute the visualization and attach it as a subdocument. IDEs interested in that visualization would load it and observe its changes.

Longer term future vision: with the AST being in the same process as Truffle we should make a connection between the AST elements and Truffle nodes and avoid reparsing the whole .enso file when only a subtree is changed.

JaroslavTulach avatar Mar 05 '24 10:03 JaroslavTulach

I have rephrased the strategic vision in a separate document. An essential part of it still remains the sharing of ASTs among peers via y.js server.

JaroslavTulach avatar Mar 28 '24 14:03 JaroslavTulach

#10182 will add an optional IdMap parameter in the text/applyEdit request.

I propose to simplify the serialized IdMap elements format from the current

[[{"index":{"value":0},"size":{"value":4}},"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]

to an array of 3 elements: index, size and uuid

[[0,4,"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]

4e6 avatar Jun 11 '24 11:06 4e6

#10182 will add an optional IdMap parameter in the text/applyEdit request.

I propose to simplify the serialized IdMap elements format from the current

[[{"index":{"value":0},"size":{"value":4}},"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]

to an array of 3 elements: index, size and uuid

[[0,4,"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]

I'm ok, but perhaps we should support reading both formats for some time.

farmaazon avatar Jun 11 '24 11:06 farmaazon

I propose to simplify the serialized IdMap elements format from the current to an array of 3 elements: index, size and uuid

[[0,4,"bfae92f7-07d0-4886-b37b-a26e17ee943f"],...]
  • the format change has been tracked as #7989

I'm ok, but perhaps we should support reading both formats for some time.

Yes, being able to read previous node positions in new IDE is essential.

JaroslavTulach avatar Jun 11 '24 13:06 JaroslavTulach

I'm not going to change the parser. I was talking about the JSON-RPC text/applyEdit parameter

4e6 avatar Jun 11 '24 15:06 4e6

#10283 adds an optional idMap parameter to text/applyEdits that overrides/complements the IdMap in the file

interface TextApplyEditParameters {
  /** The file edit. */
  edit: FileEdit;

  /**
   * A flag indicating whether we should re-execute the program after applying
   * the edit. Default value is `true`, indicating the program should be
   * re-executed.
   */
  execute?: boolean;

  /**
   * An identifiers map associated with this file as an array of
   * index, length, uuid triples. The old id map format that was used in the
   * source file is also supported.
   */
  idMap?: [number, number, UUID][];
}

@farmaazon the next step could be to keep in the file only the IDs that are used in the second metadata line (with node positions, etc.) and send the rest of the IDs with the text/applyEdit request. This should reduce the size of the IdMap section in the file metadata. If you're ok with the idea, I'll implement it as a followup PR.

4e6 avatar Jun 17 '24 09:06 4e6

@farmaazon the next step could be to keep in the file only the IDs that are used in the second metadata line (with node positions, etc.) and send the rest of the IDs with the text/applyEdit request. This should reduce the size of the IdMap section in the file metadata. If you're ok with the idea, I'll implement it as a followup PR.

Looks good, I'm ready to help with ydoc-server code if needed.

farmaazon avatar Jun 17 '24 09:06 farmaazon

Dmitry Bushev reports a new STANDUP for today (2024-06-24):

Progress: Started working on storing the reduced IdMap in the file. Implemented the IdMap diff. Implemented sending IdMap in the applyEdit request. It should be finished by 2024-06-29.

Next Day: Next day I will be working on the #9257 task. Continue working on the task

enso-bot[bot] avatar Jun 24 '24 17:06 enso-bot[bot]

Dmitry Bushev reports a new STANDUP for today (2024-06-25):

Progress: Playing with different usage cases. Debugged and fixed the issue with the metadata recovery. Fixed the issue with the broken IdMap between the project restarts. Undrafted the PR It should be finished by 2024-06-29.

Next Day: Next day I will be working on the #9257 task. Continue working on the task

enso-bot[bot] avatar Jun 25 '24 16:06 enso-bot[bot]