OpenRefine icon indicating copy to clipboard operation
OpenRefine copied to clipboard

Add a "display unprintable chars" toggle mode

Open thadguidry opened this issue 8 years ago • 22 comments

As mentioned in #1268 ... because I'm a purist and because I work with UTF-8 quite a bit and unprintable control characters are significant to my work...

I want to see in OpenRefine a toggle mode for DISPLAY UNPRINTABLE CHARS for our users (like the ACK and CR/LF below)...so that they show up and are highlighted with special coloring/font. This toggle mode should work perhaps on both datagrids & facets, and perhaps other dialogs or extension mechanisms, yet to be determined.

This is work to be done in the future in part of our UI Enhancement project(s).

capture capture

We will want to utilize CSS styling for this and not actual Javascript replacement characters. An example of how CSS Styling in a cell could be utilized for the various unprintable control characters: http://mleibman.github.io/SlickGrid/examples/example8-alternative-display.html

Another alternative might be https://developer.mozilla.org/en-US/docs/Web/SVG/Element/symbol and calling <use> whenever we need to show unprintable characters

thadguidry avatar Oct 23 '17 22:10 thadguidry

@ostephens @ettorerizza @wetneb Do you guys also like that orange colored example ? Does it have a good fit and finish for cell styling in OpenRefine ? Of course we will work with a Google Designer for some of this later on, but for now I wanted to just show a better example of how the look could be. Both Jacky and I prefer the coloring and style using graphic symbols, like example 2 shows, rather than the text based control character symbols as shown in example 1 and typically used in other programs such as "SP", "ACK", "CR", "DEL", "TAB", etc.

VOTE: Agree that example 2 is more towards our liking and preference for the future UI toggle of "display all chars" ?

thadguidry avatar Nov 09 '17 17:11 thadguidry

The colored symbols also seem to me both clearer and more aesthetic.

ettorerizza avatar Nov 09 '17 17:11 ettorerizza

To be contrary, I have to admit I prefer the text based control character symbols. But happy to go with the majority view!

ostephens avatar Nov 10 '17 09:11 ostephens

@ostephens Why do you like them more ? because they stand out more ? they are a bit larger ?

capture capture

thadguidry avatar Nov 10 '17 14:11 thadguidry

I prefer them both because they are easier to read (larger text) and I feel they are more explicit and easier to learn - I don't have to learn what each of the icons/graphics stand for.

On the latter point - easier to know the difference between SP and NBSP in comparison to learning which one is represented by a filled in square, and which is an empty square.

ostephens avatar Nov 10 '17 15:11 ostephens

@ostephens interesting you mention you have to learn what the icons/graphics stand for. The icons and graphics are actually international and well defined. You will see them on Chinese and Ukrainian keyboards for instance. Look at your Tab key. But I think I understand...we're a bit of an internet society now and the historical ways of labeling don't want to budge. :)

Its a bit 'in your face'... but that's a pretty useful thing for things that were hidden before and make you want to beat your head into a wall when things don't work as intended :)

OK. We will go with just simple text highlighted labels instead of the graphic symbols.

thadguidry avatar Nov 10 '17 16:11 thadguidry

@thadguidry I agree some of the icons are clear/well understood (the tab is a good example). I also accept that the icons/graphics are defined internationally - so it maybe that I just need to get used to them.

At the moment I think the weight of opinion (such as it is) is in favour of the graphic symbols - you and @ettorerizza vs me. Unless anyone else wants to support my view, then I think graphics win it tbh

ostephens avatar Nov 10 '17 17:11 ostephens

@ostephens No, I agree with you now. The \n\l for instance will be better represented, its actually 2 separate control chars, and not just 1. Despite their being Unicode graphic forms that represent some of these control characters, not all of the char(0) - char(31) are well represented in those graphic forms.

We'll use the Unicode abbreviations (NUL, DEL, etc) https://en.wikipedia.org/wiki/Control_character but not for HT, that should be replaced with TAB.

thadguidry avatar Nov 10 '17 17:11 thadguidry

Maybe of interest - in this post the approach taken is to replace char(0)-char(32) with the equivalent Unicode control picture https://charlee.li/display-non-printable-characters.html

ostephens avatar Mar 20 '18 23:03 ostephens

@ostephens Personally not a fan of the newer diagonal flows on post-2003 Unicode Control Pictures. I prefer the original horizontal flows just as they were from Bigelow 1993 and just scaling smaller with good fonts as I gave in the examples before. Which is in LucidaSansUnicode font.

https://pdfs.semanticscholar.org/4a8e/2fa49b13a2bcabf7d0ebdf2b55b0816abaf8.pdf

But perhaps on second thought...to support those that are younger aged :) the diagonal flows might be better overall since they won't confuse with regular text, but regardless my idea was to enforce some different CSS coloring for the control chars/pics.

thadguidry avatar Mar 21 '18 00:03 thadguidry

This question on StackOverflow shows the importance of this new feature. The file on the Open Refine screenshot looks very clean compared to the original Excel file, actually full of tabs and spaces.

ettorerizza avatar Mar 26 '18 20:03 ettorerizza

Found a good article that contains the mappings of the 2 and 3 letter abbreviations that we can use to replace the control chars. The article also mentions the history of the control chars and what they control :-) https://www.aivosto.com/articles/control-characters.html

It also contains translations of the abbreviations further down its page, but I think we can just rely on English only?

thadguidry avatar Feb 22 '20 20:02 thadguidry

@aghaSaad04 instead of HT (horizontal tab), I'd use the more familiar 3 letter abbreviation TAB, but keep the abbreviation for "vertical tab" to use VT

thadguidry avatar Mar 01 '20 18:03 thadguidry

Came across another complimentary use case with a dataset I was helping someone with today. Unicode values that are True Null also do not display and we should show them rendered as NUL when the user toggles the Unprintable Chars mode. (This is different from the 'null' char rendering that is not Unicode based (if I remember correctly from Owen's code on null display handling) that we already have in All->View->Show/Hide 'null' values in cells)

Notice none of the \u0000 values are displayed in the grid, but show up with .partition()

image

@ostephens I'd like to know your thoughts on this as well.

thadguidry avatar Mar 03 '20 22:03 thadguidry

Why is this labelled Priority: High? It seems like it would be a pretty rarely used feature.

tfmorris avatar Jun 13 '20 23:06 tfmorris

Don't know, ask Jacky who set it that way, but he's no longer around.

thadguidry avatar Jun 13 '20 23:06 thadguidry

Can you please assign this issue as well to me as the other one is inter-related. @thadguidry

shriyasankhyan avatar Apr 02 '22 03:04 shriyasankhyan

@shriyasankhyan Sure, it's going to be really cool stuff for your portfolio if you are up for it. I hope your HTML and CSS and most importantly JavaScript skills can handle it :-) The old branch is https://github.com/OpenRefine/OpenRefine/tree/non-printable-characters and not sure what differences are there from the previous PR #2350 from @AghaSaad04 which you are welcome to checkout locally (click <> Code button in top right of any GitHub PR) and work upon and submit as a new PR by you.

thadguidry avatar Apr 02 '22 04:04 thadguidry

Ya surely :) would look up to the provided resources. Again thanks for assigning me this issue.

shriyasankhyan avatar Apr 02 '22 07:04 shriyasankhyan

Can anyone from the community please check this pull request? @antoine2711 @thadguidry @wetneb

shriyasankhyan avatar Apr 15 '22 10:04 shriyasankhyan

htmx has features that could make it easier and straightforward to implement this issue.

thadguidry avatar Aug 21 '23 10:08 thadguidry