Support idempotent import
Support running import multiple times and not importing the same data repeatedly
We will need some way of identifying if history is duplicate. Not all importers support this - bash doesn't always, only zsh extended history provides timestamps, etc. In the best case, we won't be able to support this for all options.
We could adjust the importer/loader traits to allow for checking existing state:
https://github.com/atuinsh/atuin/blob/62f3807dcb4844f74b59c7bcfb81c9a914da7353/crates/atuin-client/src/import/mod.rs#L22-L33
Currently, importers are designed to be one-way, as they simply enumerate history for a different system to import.
It could also make sense to do the de-duping elsewhere, after it's been loaded from an importer.
Note that some importers have differing precision on timestamps, which may affect uniqueness
To emphasize; we will never be able to guarantee full idempotency, as some older shell history is literally just a list of commands. Unless users are ok with checking dupes purely on the command.
+1/"any updates"/etc style comments will be deleted
BTW, fish history includes the timestamp.
This is an excerpt I have from an old backup (2024-11)
- cmd: ls -l /Data/ /home/
when: 1670674816
paths:
- /Data/
- /home/
To avoid duplicates, I'd be OK with keeping a watermark per host (I'm not sure if you can import history from different hosts now). Although in some very specific cases might lead to gaps:
- When restoring data from a backup, and then from an older backup that has more data about the past you'd ignore the second backup based on the watermark.
- Maybe you'll run into this if you keep "last K-day" history backups and don't import from oldest to most recent backup?
Regarding timestamp precision. I don't think it's an issue, but maybe some shells have changed their precision across different versions?
Even if that's the case, I don't think people can actually use their shell at more than 10 commands/s, so using a 100ms resolution seems enough to de-dup.