kitty icon indicating copy to clipboard operation
kitty copied to clipboard

More responsive terminal image buffer eviction

Open itsjunetime opened this issue 5 months ago • 1 comments

Is your feature request related to a problem? Please describe. In my terminal pdf viewer (tdf), moving through large pdfs causes old pages to be evicted from the image buffer in kitty (see https://github.com/itsjunetime/tdf/issues/61). This behavior makes sense and is now being adapted to with https://github.com/itsjunetime/tdf/pull/74 by reading the responses from the terminal and re-rendering pages when the terminal tells us that it couldn't display the image with the given id (as the given id was evicted from the buffer due to a previous image being send).

This is fine, but not a very good experience for the users as we don't know what images are available to us and which are not. There is always a delay for the user if an image was previously evicted, as we have to (while they are waiting on the image to display) re-render it and send it back to the terminal.

Describe the solution you'd like It would be nice if the terminal sent information about evicted ImageIds in its response. E.g. If a new image with i=2 is sent to the terminal and that causes the image with id i=1 to be evicted, the response could be \ei=2,e=1;OK\e\ (using the e key to indicate to the application that the image with ImageId 2 was evicted).

This response could be parsed and used to proactively reload images that were evicted if needed.

Describe alternatives you've considered Kitty could provide an option to never evict images, but based on the graphics documentation, providing that option is not preferable.

itsjunetime avatar Jun 18 '25 18:06 itsjunetime

Not sure I follow your use case. Even suppose I added eviction notices to the protocol, what would you do if you got an eviction notice? You cannot resend the image immediately anyway since it would evict another image from the cache. So you would need to wait till the user scrolled to the evicted page or near it before resending the image in any case.

And note that you can query if an image is still cached by creating a placement for it (you can use a dummy 1 or maybe even 0 (dont remember) pixel placement that wont be visible to the user).

kovidgoyal avatar Jun 19 '25 02:06 kovidgoyal

In my specific use-case, rendering images is expensive and I don't keep their data around once I've sent them to the terminal because that would just double the memory usage of my program. So if I received an eviction notice, I would be able to do the expensive work of re-rendering an image in a background thread, then when a user wants to see an image, I would just be able to send it back over to the terminal. This way, the user can see images promptly (without having to wait for the app to re-render them once we notice the terminal doesn't have it anymore) and we don't double-up on our memory usage by keeping each image in memory even after we've sent it to the terminal.

Sending images isn't very expensive in my use case because I store them in Shared Memory Objects, so sending them to the terminal is a very simple hand-off (as opposed to printing out the entire contents of the image).

I am aware that I can query each image to see if it is present, but then I have to interrupt the stream of stdin to wait for the response, and I've already noticed issues with user input interleaving kitty response input (specifically when users are trying to hold down on e.g. the 'j' key to move down many pages). This makes the terminal input basically unparseable (as we can't tell what characters came from the terminal and what came from the user) so I would prefer not to have to actually interface with the terminal to do this (so as to not degrade the experience of the user).

itsjunetime avatar Jun 20 '25 18:06 itsjunetime

On Fri, Jun 20, 2025 at 11:08:09AM -0700, June wrote:

itsjunetime left a comment (kovidgoyal/kitty#8737)

In my specific use-case, rendering images is expensive and I don't keep their data around once I've sent them to the terminal because that would just double the memory usage of my program. So if I received an eviction notice, I would be able to do the expensive work of re-rendering an image in a background thread, then when a user wants to see an image, I would just be able to send it back over to the terminal. This way, the user can see images promptly (without having to wait for the app to re-render them once we notice the terminal doesn't have it anymore) and we don't double-up on our memory usage by keeping each image in memory even after we've sent it to the terminal.

I suggest you keep the images on disk as temp files and rather than using shared memory use filepaths to transmit the images. This will keep memory usage under control and prevent you needing to deal with eviction at all. It's performance will be the same as for shared memory as far as human perception goes.

I am aware that I can query each image to see if it is present, but then I have to interrupt the stream of stdin to wait for the response, and I've already noticed issues with user input interleaving kitty response input (specifically when users are trying to hold down on e.g. the 'j' key to move down many pages). This makes the terminal input basically unparseable (as we can't tell what characters came from the terminal and what came from the user) so I would prefer not to have to actually interface with the terminal to do this (so as to not degrade the experience of the user).

You most definitely can parse all data received from the terminal and distinguish keypresses from escape codes. Terminal programs have been doing that for decades, to do it with maximum robustness you should use the kitty keyboard protocol, but even without it you can parse out all escape codes from keypresses and pasted text. There are many libraries that do it for you.

kovidgoyal avatar Jun 21 '25 02:06 kovidgoyal

Closing as I dont really see the point of this feature, but feel free to discuss further.

kovidgoyal avatar Jun 21 '25 02:06 kovidgoyal

I really would still prefer my suggestion, if you're willing to reconsider.

I suggest you keep the images on disk as temp files and rather than using shared memory use filepaths to transmit the images

I understand that this will decrease the memory usage, but that's a tradeoff I would like to be able to make. I can't guarantee that the directory I'll be writing to is actually a tmp directory, so if the program fails to cleanup correctly for whatever reason (whether that be a logic bug or someone kill -9'ing it), the temporary files that it creates could just sit around forever taking up space. That's just not something I want to have to deal with.

Even outside of the space-on-disk concerns, your suggestion doesn't provide a very nice experience for the user. Every time I would try to display an image, I'd have to do one of the following:

  1. Always send the image to kitty via its filepath, even if we've sent it before (since we don't know if it's in memory or not) - this is obviously not preferable as it would take more time for files to be read every time, fill up the image storage buffer with multiple copies of the same image, and just unnecessarily slow things down
  2. Try to send an ImageId + placement alone, then if that fails, send the filepath. This would work fine if it happens to be in memory, but if it doesn't, it'll cause an annoying flash for the users as the previous images are cleared (as I must send that over before sending the ImageId). This also seems annoying to have to deal with for the users
  3. Query if an image exists in memory, and then send the filepath or ImageId depending on if it exists or not. This is the best of these three options, but just requires us to do unnecessary work when we could be getting this information for free in the responses.

It's performance will be the same as for shared memory as far as human perception goes.

I would prefer to not have to bet on that. Unnecessarily sacrificing performance is just not something I want to do.

You most definitely can parse all data received from the terminal and distinguish keypresses from escape codes. Terminal programs have been doing that for decades, to do it with maximum robustness you should use the kitty keyboard protocol, but even without it you can parse out all escape codes from keypresses and pasted text. There are many libraries that do it for you.

I'm using one such library - crossterm (which does support the kitty keyboard protocol by default, if I'm not mistaken) - and if a user holds down on J to move through images, I'll get responses from the terminal that look like \e_JGi=1;OK\e\\. Would you suggest this is a bug in the library that I should look into?

itsjunetime avatar Jun 27 '25 01:06 itsjunetime

On Thu, Jun 26, 2025 at 06:58:19PM -0700, June wrote:

itsjunetime left a comment (kovidgoyal/kitty#8737)

I really would still prefer my suggestion, if you're willing to reconsider.

I suggest you keep the images on disk as temp files and rather than using shared memory use filepaths to transmit the images

I understand that this will decrease the memory usage, but that's a tradeoff I would like to be able to make. I can't guarantee that the directory I'll be writing to is actually a tmp directory, so if the program fails to cleanup correctly for whatever reason (whether that be a logic bug or someone kill -9'ing it), the temporary files that it creates could just sit around forever taking up space. That's just not something I want to have to deal with.

Cleaning up temporary files robustly is a long ago solved problem. You simply fork a child process that listens to a pipe from the parent via an inherited fd. Pass all resources you want cleaned up on parent exit to the child via the pipe. In the child when the pipe closes, clean up the resources. On parent exit no matter if it is SIGKILLEd or anythig else, the kernel closes the pipe.

Even outside of the space-on-disk concerns, your suggestion doesn't provide a very nice experience for the user. Every time I would try to display an image, I'd have to do one of the following:

  1. Always send the image to kitty via its filepath, even if we've sent it before (since we don't know if it's in memory or not) - this is obviously not preferable as it would take more time for files to be read every time, fill up the image storage buffer with multiple copies of the same image, and just unnecessarily slow things down
  2. Try to send an ImageId + placement alone, then if that fails, send the filepath. This would work fine if it happens to be in memory, but if it doesn't, it'll cause an annoying flash for the users as the previous images are cleared (as I must send that over before sending the ImageId). This also seems annoying to have to deal with for the users

I dont follow, you assign a unique image id to each page. Send the placement command using that id, with a z-index of -1. If it succeeds, now hide the previous page image. If it fails resend and hide the previous image. No annoying flash, no sognificant extra work for you.

  1. Query if an image exists in memory, and then send the filepath or ImageId depending on if it exists or not. This is the best of these three options, but just requires us to do unnecessary work when we could be getting this information for free in the responses.

It's performance will be the same as for shared memory as far as human perception goes.

I would prefer to not have to bet on that. Unnecessarily sacrificing performance is just not something I want to do.

There is a reason premature optimization is considered the root of all evil. You will find if you actually benchmark this, that using shared memory will give you worse overall performance in the face of cache evictions.

You most definitely can parse all data received from the terminal and distinguish keypresses from escape codes. Terminal programs have been doing that for decades, to do it with maximum robustness you should use the kitty keyboard protocol, but even without it you can parse out all escape codes from keypresses and pasted text. There are many libraries that do it for you.

I'm using one such library - crossterm (which does support the kitty keyboard protocol by default, if I'm not mistaken) - and if a user holds down on J to move through images, I'll get responses from the terminal that look like \e_JGi=1;OK\e\\. Would you suggest this is a bug in the library that I should look into?

These are graphics protocol responses. If you are dealing with graphics you need to process them. Or send q=2 in your graphics commands to suppress them.

kovidgoyal avatar Jun 27 '25 02:06 kovidgoyal