reading-friendly format for printing columns
Following the Zulip discussion about pr-str printing of columns, we are suggesting the following format for printing columns (this is basically @joinr's suggestion at that thread, with some minor changes).
#column[int64 [3] :x [1, 2, 3]]
This way, the information is wrapped in brackets, so it can be handled gracefully by editors in terms of indentation and structural editing. Also, potentially we may make columns readable using tagged literals.
Some things to take into account:
- breaking long lines
- taking missing values into account
- respecting
*print-length*
Hi @ashimapanjwani, let us discuss this?
I imagine you may have some comments about this idea, following your recent exploration of other aspects of printing.
I think there are two unresolved issues so far:
- How to handle missing values.
- How to handle columns with 1000+ items in them which are quite common.
So, I think large columns we have to leave as is.
But I single line reader-safe representation is totally reasonable and in fact I now think that literally everything in the dataset/datatype system aside from tensors and datasets should have a single line reader safe representation. Thanks for being patient with this -- we will get there :-).
Let's split this up. First is the change in format for existing columns. Reader literals are another issue altogether. I also think that columns should display some number of begin elements, and elipsis, then some number of end elements. It is just too useful to see the head/tail of the dataset quickly. The format of tmdjs is the above format.
So first step is to change the column print format to the above suggestion.
This has stalled and no one has worked on it for a while. I do think the column print format could be better but I don't think this is a pressing issue at this time. We can reopen and when there is a PR that makes the column printing measurably better.