gemini-cli feat: terse transformations of image paths in text buffer

TLDR

This PR addresses terse representations of image paths by introducing a transformed representation to sit between the existing logical and visual representation. This allows for terse representations, or transformations, to collapse and expand depending on cursor position.

It could be that this cursor dependent expansion of terse representations may not be the best approach for this feature's UX. This PR needs feedback in that regard.

Examples

MacOS Temporary Directory:

General UX

(note: [Image fake.png] is not a true transformation or a file path -- it is just regular user input directly typing that, which is why it is not detected by the logic)

https://github.com/user-attachments/assets/8ad4162d-1d6e-4b5d-97ab-bd3f53700a7b

Dive Deeper

The core of the PR is:

logicalLines are transformed to transformedLines according to the defined transformation conditions (image paths)
transformedLines are used to create visualLines
calculateLayout is called when the cursor moves from and between transformations -- this retains the ethos of not re calculating layout unless needed while explicitly stating that we need to re calculate b/c the state of transformations changes as the mouse exits and enters them.

File by File

text-buffer.ts

Transformation Interface
Defines a Transformation object to represent sections of logicalLines that should render a terse image label in the CLI.
calculateTransformationsForLine / calculateTransformations
Use imagePathRegex to detect image paths and build a cached Transformation[][] (transformationsByLine) over all lines.
The regex:
- Anchors on @…\.(png|jpg|jpeg|gif|webp|svg|bmp)
- Stops at whitespace so it doesn’t swallow @file1.txt @image2.jpg
- Still allows @ inside filenames (e.g. @2x.png)
- Resets imagePathRegex.lastIndex per call
- Merges adjacent image paths without whitespace (@[email protected]), but treats spaced ones (@a.png @b.png) as separate
calculateTransformedLine
Builds a transformed line plus a map where each index of the transformedLine maps back to an index in the logicalLine.
When a Transformation is active or collapsed, the mapping is distributed monotonically:
```
Math.floor((i * logicalLength) / transformedLen)
```
calculateLayout
Creates visualLines based on the current Transformations and cursor state, so logicalCursor is passed to calculateLayout because the transformed representation (collapsed vs expanded) depends on cursor position.
calculateVisualCursorFromLayout
Calculates the visual column from the transformed/wrapped coordinate system.
textBufferReducerLogic
Still owns core editing semantics (insert, delete, word motions, vim ops, etc.) and uses TextBufferOptions (inputFilter, singleLine). Logical cursor math (offsetToLogicalPos, etc.) is unchanged; transformation logic is layered outside this function.
textBufferReducer
Wraps textBufferReducerLogic to:
- Keep transformationsByLine up to date via calculateTransformations when lines change
- Use getTransformUnderCursor on old/new state to detect:
  - entering a transformation
  - exiting a transformation
  - moving between different transformations
- Call calculateLayout in one additional case: previously only when width or lines changed, now also when the cursor crosses transformation boundaries. In that case it may reset preferredCol and returns updated visualLayout + transformationsByLine.
useTextBuffer
Initializes lines, cursorRow, cursorCol, transformationsByLine = calculateTransformations(initialLines), and visualLayout = calculateLayout(initialLines, viewport.width, [cursorRow, cursorCol]), then uses useReducer((s, a) => textBufferReducer(s, a, { inputFilter, singleLine }), initialState).

highlight.ts

parseInputForHighlighting
Extends previous logic to tokenize Transformation spans, accepting an optional logicalCursor so it is only supplied when cursor-aware highlighting is needed.
buildSegmentsForVisualSlice → parseSegmentsFromTokens
Renamed for clarity and convention; same behavior, but now explicitly parses segments from a richer token stream (including transformation tokens).

InputPrompt.tsx

Calls the tokenizer on transformation-aware tokens.
Computes visualStart and visualEnd for segmentation based on the transformed representation using visualToLogicalMap, transformedToLogicalMaps, and visualToTransformedMap.
Uses display instead of seg.text for clarity, since let display = seg.text

Reviewer Test Plan

Create a test directory
Create a long path (transform.length < raw.length) toward an image like "exceptionally/long/path/and/more/image.png"
Create a short path (transform.length > raw.length) toward an image like "a/pic.png"
Create an equal path (transform.length === raw.length) toward an image like "folder/pic.png"
Test the line wrapping and cursor navigation

Reference these paths in any order and combination in the CLI input area. Move your cursor left and right through the transformations, edit them, put text between the transformations, put text inside the transformations, resize the terminal.

Note unexpected/undesirable behavior

Thank you!

Testing Matrix

	🍏	🪟	🐧
npm run	❓	❓	❓
npx	❓	❓	❓
Docker	❓	❓	❓
Podman	❓	-	-
Seatbelt	❓	-	-

Linked issues / bugs

Resolves #4669

Jul 26 '25 09:07 psinha40898

Thanks for this pr! If you can resolve the branch conflicts we'd be happy to take another look and land this change.

Sep 08 '25 17:09 jacob314

This pull request feature is implemented with all existing tests passing.

New tests should be added to the feature, but first I would like to confirm the approach.

This PR expands terse representations to their original logical form when the cursor enters the regular expression. It achieves that by monotonically mapping the transformed state to the logical state. I would love to know if this is the approach to pursue, or if there is another way to handle the UX for these terse representations that I should implement.

Sep 29 '25 20:09 psinha40898

/gemini review

Nov 20 '25 19:11 psinha40898

Review

This is a sophisticated addition that significantly improves the UX for handling long image paths. The coordinate mapping logic in text-buffer.ts is complex but seems to be implemented with care, and the unit tests for the buffer logic are thorough.

I have a few suggestions to improve robustness and test coverage:

1. Regex Limitations

The imagePathRegex in packages/cli/src/ui/components/shared/text-buffer.ts uses [^\s[\]\r\n] which explicitly excludes brackets, and the escape group (?:...|\ ) only handles escaped spaces. This means paths like @image[1].png or @image(1).png (if escaped as @image\(1\).png) might not be detected or could be truncated prematurely. Consider relaxing this to allow more characters or a broader escape sequence (e.g. \\.) if unescapePath supports them.

2. Merging Behavior

In calculateTransformationsForLine, adjacent transformations are merged:

    if (last && last.logEnd === logStart) {
      // ...
      // collapsed label = right-most image path
      last.collaspedText = getTransformedImagePath(logicalText);
    }

If I have @[email protected], it seems this will result in a single collapsed token labeled [Image b.png]. This hides the presence of the first image. Is this intentional? It might be clearer to keep them separate (i.e. [Image a.png][Image b.png]) or have a combined label.

3. InputPrompt Test Coverage

While text-buffer.test.ts covers the logic well, packages/cli/src/ui/components/InputPrompt.test.tsx mocks the buffer. Currently, the mocks (e.g., in beforeEach) initialize transformationsByLine to []. I recommend adding a test case in InputPrompt.test.tsx that:

Sets up a mock buffer with a populated transformationsByLine entry (simulating an image path).
Verifies that the rendered output (via stdout.lastFrame()) actually contains the collapsed text [Image ...] when the cursor is not on it.
Verifies it renders the full path when the cursor is on it.

This ensures the React component correctly respects the transformation state passed from the buffer.

Nit

Path Handling: getTransformedImagePath uses path.basename. Since this runs in the CLI, verify that this behaves as expected if the user is on Windows but pasting paths with forward slashes (or vice-versa), although typically the CLI normalizes paths.

Great work on a tricky feature!

Nov 21 '25 02:11 jacob314

Review

This is a sophisticated addition that significantly improves the UX for handling long image paths. The coordinate mapping logic in text-buffer.ts is complex but seems to be implemented with care, and the unit tests for the buffer logic are thorough.

I have a few suggestions to improve robustness and test coverage:

1. Regex Limitations

The imagePathRegex in packages/cli/src/ui/components/shared/text-buffer.ts uses [^\s[\]\r\n] which explicitly excludes brackets, and the escape group (?:...|\ ) only handles escaped spaces. This means paths like @image[1].png or @image(1).png (if escaped as @image\(1\).png) might not be detected or could be truncated prematurely. Consider relaxing this to allow more characters or a broader escape sequence (e.g. \\.) if unescapePath supports them.

2. Merging Behavior

In calculateTransformationsForLine, adjacent transformations are merged:
    if (last && last.logEnd === logStart) {
      // ...
      // collapsed label = right-most image path
      last.collaspedText = getTransformedImagePath(logicalText);
    }
If I have @[email protected], it seems this will result in a single collapsed token labeled [Image b.png]. This hides the presence of the first image. Is this intentional? It might be clearer to keep them separate (i.e. [Image a.png][Image b.png]) or have a combined label.

3. InputPrompt Test Coverage

While text-buffer.test.ts covers the logic well, packages/cli/src/ui/components/InputPrompt.test.tsx mocks the buffer. Currently, the mocks (e.g., in beforeEach) initialize transformationsByLine to []. I recommend adding a test case in InputPrompt.test.tsx that:

Sets up a mock buffer with a populated transformationsByLine entry (simulating an image path).

Verifies that the rendered output (via stdout.lastFrame()) actually contains the collapsed text [Image ...] when the cursor is not on it.

Verifies it renders the full path when the cursor is on it.

This ensures the React component correctly respects the transformation state passed from the buffer.

Nit

Path Handling: getTransformedImagePath uses path.basename. Since this runs in the CLI, verify that this behaves as expected if the user is on Windows but pasting paths with forward slashes (or vice-versa), although typically the CLI normalizes paths.

Great work on a tricky feature!

Regex relaxed and extended for brackets in paths
This was incorrect. Now @[email protected] will result in [Image a.png][Image b.png]. On submit the LLM usually will figure it out and request a tool call for each file
Test coverage added
We now build the transformation from the last segment of the path which removes inconsistencies when using posix style paths on windows. Tests added.

Thank you!

Nov 21 '25 15:11 psinha40898