atuin icon indicating copy to clipboard operation
atuin copied to clipboard

Support idempotent import

Open ellie opened this issue 1 year ago • 1 comments

Support running import multiple times and not importing the same data repeatedly

We will need some way of identifying if history is duplicate. Not all importers support this - bash doesn't always, only zsh extended history provides timestamps, etc. In the best case, we won't be able to support this for all options.

We could adjust the importer/loader traits to allow for checking existing state:

https://github.com/atuinsh/atuin/blob/62f3807dcb4844f74b59c7bcfb81c9a914da7353/crates/atuin-client/src/import/mod.rs#L22-L33

Currently, importers are designed to be one-way, as they simply enumerate history for a different system to import.

It could also make sense to do the de-duping elsewhere, after it's been loaded from an importer.

Note that some importers have differing precision on timestamps, which may affect uniqueness

To emphasize; we will never be able to guarantee full idempotency, as some older shell history is literally just a list of commands. Unless users are ok with checking dupes purely on the command.

Please react to this issue with 👍 if you'd like to see it implemented
+1/"any updates"/etc style comments will be deleted

ellie avatar Jul 17 '24 15:07 ellie

BTW, fish history includes the timestamp.

This is an excerpt I have from an old backup (2024-11)

- cmd: ls -l /Data/ /home/
  when: 1670674816
  paths:
    - /Data/
    - /home/

To avoid duplicates, I'd be OK with keeping a watermark per host (I'm not sure if you can import history from different hosts now). Although in some very specific cases might lead to gaps:

  • When restoring data from a backup, and then from an older backup that has more data about the past you'd ignore the second backup based on the watermark.
    • Maybe you'll run into this if you keep "last K-day" history backups and don't import from oldest to most recent backup?

Regarding timestamp precision. I don't think it's an issue, but maybe some shells have changed their precision across different versions?

Even if that's the case, I don't think people can actually use their shell at more than 10 commands/s, so using a 100ms resolution seems enough to de-dup.

Dietr1ch avatar Nov 12 '25 05:11 Dietr1ch