annotate-snippets-rs icon indicating copy to clipboard operation
annotate-snippets-rs copied to clipboard

Source code with tabs did not highlight correctly

Open legendiguess opened this issue 5 years ago • 17 comments

Here some example Slice where t symbol in source code have one space behind: image

That Slice give us that output: image

Now i change space behind t from space to tab(which still one symbol, so range of highlight should not be changed): image And highlight in output is wrong now: image

legendiguess avatar Jan 19 '20 12:01 legendiguess

Yeah, I think annotate snippets should replace tab characters with a user configurable number of spaces, and take it into account when computing underlines. This is what I do in codespan at least!

brendanzab avatar Feb 18 '20 23:02 brendanzab

agree! I am swamped a bit these days but will try to debug it! PRs welcomed!

zbraniecki avatar Mar 10 '20 01:03 zbraniecki

agree! I am swamped a bit these days but will try to debug it! PRs welcomed!

zbraniecki avatar Mar 10 '20 01:03 zbraniecki

@legendiguess - can you retest against master? #27 got merged and I can't reproduce it anymore locally!

zbraniecki avatar Mar 27 '20 00:03 zbraniecki

@zbraniecki Just retested it - problem still here: изображение

Here gist with minimal reproducible example https://gist.github.com/legendiguess/ccb197dac4bc77e3ae7571befc8ffbe2

legendiguess avatar Mar 27 '20 03:03 legendiguess

Ah, I see. Now I understand the issue.

@brendanzab 's solution is one option.

The other would be to only adjust the underline calculation to account for tabs in line, but I'm not sure if display of "\tt" is always 5 ascii characters.

zbraniecki avatar Mar 27 '20 03:03 zbraniecki

The other would be to only adjust the underline calculation to account for tabs in line, but I'm not sure if display of "\tt" is always 5 ascii characters.

\t move next characters to the next tabulation column. so basically it's adds from 1 to 4 ascii characters, dependent on caret position off last tabulation column

legendiguess avatar Mar 27 '20 04:03 legendiguess

Yeah, so it may be tricky to predict. Wondering how rustc handles that in error reporting now.

zbraniecki avatar Mar 27 '20 04:03 zbraniecki

Yeah, so it may be tricky to predict. Wondering how rustc handles that in error reporting now.

https://github.com/rust-lang/rust/blob/5e8897b7b51636f157630e6639b711d698e1d101/src/librustc_errors/styled_buffer.rs#L16

They don't handle \t in a fully "proper way"

legendiguess avatar Mar 27 '20 04:03 legendiguess

I do this using a custom writer in codespan-reporting: https://github.com/brendanzab/codespan/blob/a77e646944765dfb338185b0ba9504500afb7304/codespan-reporting/src/term/config.rs#L56-L84

brendanzab avatar Mar 27 '20 05:03 brendanzab

Oh, and importantly I also measure the length of strings taking the tab width into account: https://github.com/brendanzab/codespan/blob/a77e646944765dfb338185b0ba9504500afb7304/codespan-reporting/src/term/config.rs#L35-L45

brendanzab avatar Mar 27 '20 05:03 brendanzab

Here's the relevant tests (excuse the not-so-nice test setup!): https://github.com/brendanzab/codespan/blob/a77e646944765dfb338185b0ba9504500afb7304/codespan-reporting/tests/term.rs#L415-L542

brendanzab avatar Mar 27 '20 05:03 brendanzab

\t move next characters to the next tabulation column. so basically it's adds from 1 to 4 ascii characters, dependent on caret position off last tabulation column

Oh wait, I understand what you're saying now, @legendiguess - yes I don't handle this properly in codespan-reporting either 🤦‍♂

brendanzab avatar Mar 27 '20 05:03 brendanzab

https://github.com/rust-lang/annotate-snippets-rs/issues/58 can be a duplicate.

tisonkun avatar Jun 26 '23 05:06 tisonkun

a user configurable number of spaces

When I dig into this issue yesterday, it turns out to be quite subtle even the expand POSIX command can produce "unexpected" result.

That said, leading \t (only) is a easy case, while tab in the medium can be quite complex, especially mixed with other whitespace. Generally, tab means making table according to tabstop so you need to translate the tab into whitespace to ensure that the whitespace width are aligned at the tabstop. But it's very unclear that even expand handles \t and \t differently within one tabstop slot.

Concretely, with tabstop=4, \t is translate to four spaces while \t is translated to five spaces. I suspect it's because is treated either as part of the whitespace or part of the next "column".

Generally, users are aware of the "final" result displayed on their screen, and don't care what the whitespaces there are. So, I guess there is no consensus on how a tab should be formatted "properly". We can make our rule, while it can still be different to what users "expected".

tisonkun avatar Jun 26 '23 05:06 tisonkun