DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Unsigned Int displayed as Int

Open lbilli opened this issue 3 years ago • 10 comments

I'm not sure if it is by design, but I was slightly surprised by the way UInt are displayed when in a DataFrame compared to regular vectors, i.e. decimal vs. hexadecimal notation:

julia> using DataFrames

julia> v = UInt8[1,125,253]
3-element Vector{UInt8}:
 0x01
 0x7d
 0xfd

julia> DataFrame(:v => v)
3×1 DataFrame
 Row │ v     
     │ UInt8 
─────┼───────
   1 │     1
   2 │   125
   3 │   253

lbilli avatar Mar 25 '21 13:03 lbilli

Yes - we have more such inconsistencies - especially for Bool type. The question is - do you think it is problematic? Note that we display eltype on top.

CC @ronisbr

bkamins avatar Mar 25 '21 13:03 bkamins

Certainly not a big deal and mostly about aesthetics, yet it took me a double take to the header to make sure no unexpected conversion had happened behind the scenes.

Also, I'd argue that when dealing with bit patterns or masks, a lot is lost with decimal notation:

julia> v = [0x00ff, 0xff00]
2-element Vector{UInt16}:
 0x00ff
 0xff00

julia> DataFrame(:v => v)
2×1 DataFrame
 Row │ v      
     │ UInt16 
─────┼────────
   1 │    255
   2 │  65280

lbilli avatar Mar 25 '21 14:03 lbilli

@ronisbr - do you remember why we decided to go this way (apart from handling Bool as a special case, which I think we can keep as is - i.e. printing true and false)?

bkamins avatar Mar 25 '21 14:03 bkamins

I think we are not handling any special cases. After a very long discussion with me, you, and @nalimilan, we decided to be consistent with print. What we have is exactly what is obtained from print, in all cases.

I think the only changes were on nothing and missing.

ronisbr avatar Mar 25 '21 14:03 ronisbr

I forgot to mention something! If you want something close to what Julia uses by default in REPL, you can change the renderer to show using:

julia> show(df, renderer = :show)
3×1 DataFrame
 Row │ v
     │ UInt8
─────┼───────
   1 │  0x01
   2 │  0x7d
   3 │  0xfd

ronisbr avatar Mar 31 '21 12:03 ronisbr

@nalimilan - I guess, especially given the last comment by @ronisbr we can close this. OK?

bkamins avatar Mar 31 '21 19:03 bkamins

Well, yeah, at least it works as intended. IIRC I advocated using show for most types except a few special common types (like strings) during the long discussion we had when moving to PrettyTables. Anyway we can change this after 1.0 if we want.

nalimilan avatar Apr 01 '21 13:04 nalimilan

Instead of hardcoding either one, I was wondering if it has been considered to put some of these settings in some global variable that users can easily customize to their taste (similar to how options() works in R).

Beside renderer, users might have their preferences also regarding nosubheader, show_row_number, hlines to name a few.

I guess the advanced user can already achieve this by carefully overriding Base.show but it sounds rather cumbersome.

lbilli avatar Apr 01 '21 15:04 lbilli

I would prefer to have it documented how to do it. We tried very hard for years to avoid global state of DataFrames.jl, as having such a state is error prone and does not play well with multi-threading.

bkamins avatar Apr 01 '21 15:04 bkamins

The very first PrettyTables.jl implementation I tried to do here in DataFrames.jl had a global state that you could modify. However, this added a huge performance loss in time to print the first table (it was almost 3x slower).

ronisbr avatar Apr 01 '21 18:04 ronisbr