helix icon indicating copy to clipboard operation
helix copied to clipboard

Add :character-info command

Open wetheredge opened this issue 1 year ago • 7 comments

This is my take on Vim's get ascii (ga) / get utf8 (g8) command. This works on graphemes not characters, so it correctly handles things like emoji with skin tone modifiers or characters with combining diacritics.

I'm open to opinions on what the output should look like. Atm, it outputs the decimal value only for single byte characters in an ascii-compatible encoding. For UTF-8, it also prints the decoded codepoints. (The decoder is based on fasterthanlime's article since I could not find anything in std or already in helix, or on crates.io for that matter, and it's pretty concise).

Character Encoding Output
a UTF-8 "a" (U+0061) Dec 97 Hex 61
a windows-1252 / extended ascii "a" Dec 97 Hex 61
newline UTF-8 "\n" (U+000a) Dec 10 Hex 0a
CRLF UTF-8 "\r\n" (U+000d U+000a) Hex 0d + 0a
é UTF-8 "é" (U+00e9) Hex c3 a9
é windows-1252 "é" Hex e9
UTF-8 "ë" (U+0065 U+0308) Hex 65 + cc 88
👍🏽 UTF-8 "👍🏽" (U+1f44d U+1f3fd) Hex f0 9f 91 8d + f0 9f 8f bd

Remaining tasks:

  • [x] Improve the help text
  • [x] Replace Debug formatting of character with something more appropriate. Debug handles escaping non-printable & whitespace characters, but it also escapes the combining diaeresis on the ë…
    • Now the only escapes are \0, \t, \n, and \r
  • [x] Tests

I'm not familiar with the codebase, so let me know if there's anything that could be more idiomatic. It does not have any tests yet, but it looks like most commands don't?

closes #3885

wetheredge avatar Sep 27 '22 20:09 wetheredge

Instead of a regular command which must be bound to a key, I think this would be appropriate as a typable command (helix-term/src/commands/typed.rs). We could follow Vim and call it :ascii (aliased to :as) or come up with a new name

the-mikedavis avatar Sep 27 '22 21:09 the-mikedavis

Got it switched over to a typable command. I've named it :character (aliased to :char) since it should support any encoding, even if it's not ASCII compatible.

wetheredge avatar Sep 28 '22 01:09 wetheredge

Maybe something a bit more descriptive like :character-info?

sudormrfbin avatar Sep 28 '22 12:09 sudormrfbin

Rebased to fix conflicts. Let me know if I should write tests, otherwise it should be ready

wetheredge avatar Oct 04 '22 03:10 wetheredge

I think this is looking great 😀

You can add an integration test like so:

// helix-term/tests/test/commands.rs
#[tokio::test(flavor = "multi_thread")]
async fn test_character_info() -> anyhow::Result<()> {
    test_key_sequence(
        &mut helpers::AppBuilder::new().build()?,
        Some("ih<esc>h:char<ret>"),
        Some(&|app| {
            assert_eq!(r#""h" (U+0068) Dec 104 Hex 68"#, app.editor.get_status().unwrap().0);
        }),
        false,
    )
    .await?;

    Ok(())
}

(you will want to rebase on latest master first since there are recent changes to the integration testing harness)

the-mikedavis avatar Oct 29 '22 22:10 the-mikedavis

Thanks for the pointer on the tests. I added that and a few more cases to fully cover the possible output formats.

wetheredge avatar Oct 31 '22 20:10 wetheredge

@archseer, sorry for the ping, ~~but is there any chance of this making it into 22.12?~~ I can rebase if that would be helpful.

Oops too late there on my part :sweat_smile:

wetheredge avatar Dec 07 '22 02:12 wetheredge

Should I resolve the conflict? Looks like it's just accepting both the test I added & a couple new ones from master.

wetheredge avatar Jan 18 '23 23:01 wetheredge

Sorry for the delay! Looks good to me, just needs a rebase :)

archseer avatar Feb 02 '23 19:02 archseer