Use zero-width delimiters for role tracking in gptel-mode
Title: Use zero-width delimiters for role tracking with overlay-based highlighting in gptel-mode buffers
This is a proposal for a new approach to tracking and visually distinguishing assistant/user roles in gptel buffers that addresses several long-standing issues (#321, #343) while maintaining compatibility with the existing system.
The Problem
Currently gptel uses text properties to track which sections of text are assistant responses. This approach has proven problematic because:
- Text properties don't interact naturally with standard Emacs editing operations
- Property stickiness creates ambiguous cases during editing
- Yanked text carries properties that can cause confusion
- Visual feedback about roles is difficult to implement reliably
Proposed Solution
Use zero-width Unicode characters as role delimiters with overlay-based highlighting, but only when gptel-mode is active:
- U+200B (zero-width space) marks response start
- U+200C (zero-width non-joiner) marks response end
- Overlays provide visual distinction for responses
Key aspects:
- Delimiters are invisible and don't affect buffer display
- Standard editing operations work naturally
- Cut/paste preserves role boundaries correctly
- Overlays provide clean visual feedback
- Works with all major modes
Implementation
The solution uses two phases:
-
When gptel-mode is enabled:
- Convert existing gptel text properties to delimiter pairs
- Remove gptel properties
- Enable delimiter-based role tracking
- Create overlays for responses
-
When gptel-mode is disabled:
- Convert delimiters back to gptel properties
- Remove delimiters
- Remove overlays
- Restore property-based tracking
Response Highlighting
Use overlays for visual distinction:
- Clean visual distinction
- No interference with text properties
- Preserves other modes' fontification
- Easy to customize appearance
Benefits
-
Reliable editing operations:
- Cut/paste works naturally
- Undo/redo maintains role boundaries
- No property stickiness issues
-
Better user experience:
- Clear visual distinction of responses
- Predictable editing behaviour
- Compatible with standard Emacs commands
- Non-intrusive highlighting
-
Technical improvements:
- Simple to parse conversation history
- Clean visual feedback via overlays
- Works with all major modes
- Separation of tracking and display
Testing
To test this change:
- Enable gptel-mode in a buffer with existing responses
- Verify properties convert to delimiters correctly
- Test editing operations (especially cut/paste)
- Verify overlay highlighting
- Disable mode and verify cleanup
- Check property restoration
Notes
- Only affects buffers with gptel-mode active
- Zero-width characters don't affect buffer display or export
- Maintains compatibility with existing gptel features
- Solves long-standing editing issues
- Provides clean visual distinction via overlays
Caveat
There is an obvious caveat here. Enabling the mode mutates the buffer. The characters I have chosen are highly unlikely to appear in regular text. One solution might be that instead of predefining two characters, allow these characters to be configurable via buffer local variables or customisation, or have them automatically selected from a set of candidate characters which characters do not appear in the buffer when scanned upon entering gptel-mode.
Related issues: #321, #343
I like this as a solution that would also make it very simple to edit responses, which is quite a powerful method for guiding output.
I can forsee situations where an odd number of separators exist in the buffer, which would cause gptel-send to fail. In that case a function gptel-mark-response could simply wrap a selected region with the separators, deleting any that are inside the active region. gptel-show-separators/gptel-hide-separators could also replace the separators with something visible for inspection. The latter would most usefully replace the separators with some indicator of message count, probably xml-like (<message_1> </message_1>).
I also feel this violates the central ethos of gptel that prevented karthik from using response indicators in the first place. You would end up with documents containing invisible characters if you copy-and-paste responses. But I do think it addresses the main issues #546 without introducing more unacceptable problems. Backwards compatibility could be maintained by automatically dropping the separators in buffers where the previous method was used.
I think the zero-width delimiter approach effectively addresses these concerns while maintaining gptel's simplicity:
-
Invisible but robust role tracking:
- Zero-width delimiters mark response boundaries (carefully chosen to avoid text conflicts)
- Overlays provide clear visual feedback of boundaries
- Delimiters aren't saved to disk, preserving clean file format
- Existing
GPTEL_BOUNDScontinue working normally
-
Optional safe editing operations in gptel-mode buffers:
- Add advice to emacs editing primitives to handle delimiters:
(advice-add 'insert-before-markers :around #'gptel--clean-insertion-advice) (advice-add 'delete-region :around #'gptel--preserve-delimiters-advice) - Strip delimiters from inserted text
- Preserve necessary delimiters at region boundaries during deletion
- External editors can modify files without corruption
- Copy/paste operations work cleanly
- Add advice to emacs editing primitives to handle delimiters:
-
Recovery tools:
gptel-mark-responseto mark region as response (or with prefix to mark as prompt)gptel-validate-bufferto check and repair delimiter integritygptel-show-separators/gptel-hide-separatorsfor visual inspection- Overlay system shows current prompt/response status clearly
-
Backwards compatibility:
- No migration needed for existing chat logs
- Delimiters recreated from bounds when loading buffer
- Maintains the "everything up to cursor" interaction model
A simpler alternative would be to:
- Skip the safe editing operations entirely
- Rely on clear overlay feedback to show prompt/response regions
- Trust users to maintain/repair their chat buffers as needed
- Provide the same robust recovery tools above
This simpler approach might be preferable - users get immediate visual feedback about response regions and can easily fix any corruption using gptel-mark-response. The editing safeguards may be unnecessary complexity given good overlay feedback and repair tools.
All that said and backtracking a bit, @daedsidog suggested in #343 that simply making regions explicitly visible and allow them to be fixed up with gptel-mark-response (or gptel-toggle-response-role per his suggestion) might be easiest because
- It maintains the existing text property mechanism but adds explicit user control
- It avoids introducing new delimiter-related complexity and edge cases
- The visual feedback is also through overlays and makes it clear what's prompt vs response
- Manual region marking with
gptel-mark-responsegives users direct control of prompt vs response
The zero-width delimiter approach I've suggested, while elegant in some ways, introduces:
- New edge cases around delimiter handling
- Possibly complex advice on editing primitives if you take it that far
- Potential for delimiter corruption requiring repair tools
- Additional complexity in buffer management
On balance perhaps the simplest solution would be:
- Keep existing text property mechanism
- Add clear overlay-based visual feedback
- Provide
gptel-mark-responsecommand for manual region control - Trust users to maintain their chat buffers with these tools
This would maintain gptel's existing M.O. while giving users the tools they need to manage/edit prompt/response regions effectively. The visual feedback through overlays addresses the "what is marked as what" problem, while manual region control handles edge cases without introducing new complexity.
The benefit of #565 the zero width delimiter solution is that with careful editing (avoiding region boundaries) you won't break the prompt response sequence within a buffer. But with the existing text properties mechanism you always will break the sequence because of the way text properties are handled in emacs. That is, more often than not, prompt/response regions will need to be "fixed up after editing the buffer", notwithstanding the sticky patch https://github.com/karthink/gptel/commit/25efd55002c591b3721cdd2c96dac93d70dce814 that @karthink recently introduced to mitigate this (I've found myriad ways to break this with yank and other editing commands).
[2025-01-17 Fri 11:32]
A potential way to introduce this without changing the way gptel fundamentally works would be introducing two customizeable variables like gptel-response-start/gptel-response-end, which when both non-nil will break up the buffer like the described behavior. Surfacing this in the transient menu would allow users to opt to use this behavior in some buffers and not others.