history import imports lines it shouldn't
My ~/.bash_history file looks like this:
#1650679383
git co master
#1650679389
git branch -D exclude-take-3
#1650679393
git pull
#1650985607
ls testing
And atuin imported all of the command times (#1650985607, for example) as commands. I'm honestly not exactly sure what part of my config tells bash to write those comments - I'm guessing it's HISTTIMEFORMAT? Here's the possibly-relevant section of my bashrc:
# append to history instead of overwriting
shopt -s histappend
# Save multi-line commands as one command
shopt -s cmdhist
# Before each bash prompt, write to history and read from it. This
# makes multiple terminals sync to history
# PROMPT_COMMAND='history -a; history -n'
# Huge history. Doesn't appear to slow things down, so why not?
HISTSIZE=500000
HISTFILESIZE=100000
# Avoid duplicate entries
HISTCONTROL="erasedups:ignoreboth"
# Don't record some commands
export HISTIGNORE="&:[ ]*:exit:ls:bg:fg:history:clear"
# Use standard ISO 8601 timestamp
# %F equivalent to %Y-%m-%d
# %T equivalent to %H:%M:%S (24-hours format)
HISTTIMEFORMAT='%F %T '
So there's lots of imported commands like:
sqlite> select * from history where command regexp '#\d+$' limit 5;
76b773158aff4b4aaff2be7c4316976b|1650985004731089000|-1|-1|#1644242519|unknown|98099ebaa5c247cda923bba31c286cf5|lexeme:llimllib
12b1228a89d4441baaff77f8297b99c7|1650985002731116000|-1|-1|#1644242523|unknown|2106ff48b631471fa254ec9caafbdce8|lexeme:llimllib
139f80d3333a4678bb112e24c7a0bfc8|1650985000731131000|-1|-1|#1644242526|unknown|9d338cc1c9e94114843304fe4f8a1445|lexeme:llimllib
287b5f41f68a4a538a878a1e10edf7f3|1650984998731152000|-1|-1|#1644242816|unknown|73ddb792ea7349f3b30940db761d2c0d|lexeme:llimllib
cf5ad019bb59485989c59a2be289c10a|1650984996731168000|-1|-1|#1644243276|unknown|ff651a6299d247df9b863a20f9a2e14a|lexeme:llimllib
I deleted these rows with this command: delete from history where command regexp '^#\d+$';
However, atuin also imported any command that ended with a backslash incorrectly, including the next lines' timestamp. My ~/.bash_history has lines like:
dig +trace example.com\
#1644269003
which atuin imported as a line continuation. I'm not actually sure why my bash history has lines like that, but it does. Fortunately, there were only 4 of them in a 45,000 command history and none were vital, so I just deleted them with delete from history where command regexp '#\d+$' limit 5;. It's possible that there's no sensible way for atuin to avoid importing these lines this way, just reporting it since I was cleaning up the db.
Yeah, we've seen similar issues occur in ZSH(#100).
For bash, we assume that the user does not have any timestamp support (as far as I understand, out of the box bash does not support timestamps?).
I guess we can theoretically support parsing some HISTTIMEFORMAT environment variables and processing accordingly (we do similar in zsh). In the general case it's impossible though
The bash manual says:
If the HISTTIMEFORMAT is set, the time stamp information associated with each history entry is written to the history file, marked with the history comment character. When the history file is read, lines beginning with the history comment character followed immediately by a digit are interpreted as timestamps for the following history entry.
Should the import process ignore lines beginning with the history comment character? (Is this configurable or always #? no idea!)
Also if this is not helpful and you just don't plan to deal with this, no worries and feel free to close the issue.
And finally, I'm really excited to have my history in sqlite, so thanks for the software!
Should the import process ignore lines beginning with the history comment character? (Is this configurable or always #? no idea!)
I think in an ideal world, we would parse them. Although I can see this being difficult 😅
Also if this is not helpful and you just don't plan to deal with this, no worries and feel free to close the issue.
I'd like us to deal with this at some point, but I'm not sure we'll get to it any time soon. I'll leave it open for now!
And finally, I'm really excited to have my history in sqlite, so thanks for the software!
Aww yay, hope you like it!
'histchars'
Up to three characters which control history expansion, quick
substitution, and tokenization (*note History Interaction::). The
first character is the HISTORY EXPANSION character, that is, the
character which signifies the start of a history expansion,
normally '!'. The second character is the character which
signifies 'quick substitution' when seen as the first character on
a line, normally '^'. The optional third character is the
character which indicates that the remainder of the line is a
comment when found as the first character of a word, usually '#'.
The history comment character causes history substitution to be
skipped for the remaining words on the line. It does not
necessarily cause the shell parser to treat the rest of the line as
a comment.
so yes, the stupid history comment character can be changed, computers are fractally awful
I'll see if I can write a parser for this. If bash can do it, I'm sure we can too (even if it's a lil bit lossy)
I'm going to mention #167 here as a dupe, and also provide a sqlite one-liner to try and update commands with their timestamps. It seems to have worked for me, but your mileage may vary.
I also suspect that I've sort of broken things for myself as far as syncing is concerned until deletes are implemented...
UPDATE history
SET `timestamp` = COALESCE(
(SELECT
(CAST(LTRIM(b.command, "#") as INTEGER) * 1000000000) as newtimestamp
FROM history b
WHERE
b.rowid = (history.rowid - 1)
AND SUBSTR(b.command, 1, 1) == "#"
ORDER BY b.rowid ASC LIMIT 1
),
`timestamp`
);
DELETE FROM history WHERE substr(command, 1, 1) == "#"
This is fixed by #747
great, thanks to @ellie and @cyqsimon !
great, thanks to @ellie and @cyqsimon !
You're welcome.
'histchars' Up to three characters which control history expansion, quick substitution, and tokenization (*note History Interaction::). The first character is the HISTORY EXPANSION character, that is, the character which signifies the start of a history expansion, normally '!'. The second character is the character which signifies 'quick substitution' when seen as the first character on a line, normally '^'. The optional third character is the character which indicates that the remainder of the line is a comment when found as the first character of a word, usually '#'. The history comment character causes history substitution to be skipped for the remaining words on the line. It does not necessarily cause the shell parser to treat the rest of the line as a comment.so yes, the stupid history comment character can be changed, computers are fractally awful
Ah balls. Why would Bash do such a terrible thing?
Edit: although when I come to think of it, is this really related to the format of .bash_history?
However I do believe that my current implementation is sufficiently robust. At the moment, only lines that strictly match ^#\d+$ are considered timestamps; anything else is considered a command. So unless your command was literally something like #69420, it will not cause a misinterpretation (and even so, the only consequence is that #69420 will not get imported as a command).
I mean, okay. If you run #69420 and immediately unset HISTTIMEFORMAT so that #69420 was your last command with a timestamp, then run other commands later that get recorded without a timestamp, sure the current logic will fail to preserve ordering. But I think it's fair to say it's more of a "you issue" at this point.
However, atuin also imported any command that ended with a backslash incorrectly, including the next lines' timestamp. My
~/.bash_historyhas lines like:dig +trace example.com\ #1644269003
This is a little concerning. In this example, was the \ a literal?
Hopefully it was, because in #747 I assumed there is not a case of multi-line history in .bash_history. So I removed the logic that interprets \. If it wasn't a literal, then we are missing something here.
is this really related to the format of .bash_history?
I think that if the third character of histchars were something other than #, then .bash_history would record comments starting with something other than #.
But agreed that if you're crazy enough to do that, atuin doesn't have to support you. Really just... don't do that to yourself.
was the \ a literal?
I'm not sure, but I don't think that they always are. Here's an example from my ~/.bash_history of continuation characters in a subshell:
#1652445647
ENI=$(aws ecs describe-tasks --task $TASK --cluster $CLUSTER | \
jq -r '.tasks[0].attachments[0].details[] | \
select(.name == "networkInterfaceId").value')
here's an example of a simple command that definitely just has a continuation character:
#1656510510
sh -c 'curl -fLo "${XDG_DATA_HOME:-$HOME/.local/share}"/nvim/site/autoload/plug.vim --create-dirs \
https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim'
here's an example of continuation characters, but inside a quoted multiline filter string:
#1664588032
ffmpeg -ss 1:57 -to 2:00 -copyts -i /tmp/ytgif_cache/video_httpswwwyoutubecomwatchvnrqxmQruto.webm -filter_complex "\
[0:v] fps=10, \
scale=640:-1, \
split [a][b], \
[a] palettegen [p], \
[b][p] paletteuse, \
drawtext=borderw=1:bordercolor=black:fontcolor=white:fontsize=30:x=(w-text_w)/2:y=(h-text_h)-10:text=🔥 flames 🔥"
Maybe those are all literals - a subshell $( is a type of quote as far as bash is concerned, I guess?
See https://github.com/ellie/atuin/pull/747#discussion_r1158660221.