tech.ml.dataset icon indicating copy to clipboard operation
tech.ml.dataset copied to clipboard

reading-friendly format for printing columns

Open daslu opened this issue 4 years ago • 4 comments

Following the Zulip discussion about pr-str printing of columns, we are suggesting the following format for printing columns (this is basically @joinr's suggestion at that thread, with some minor changes).

#column[int64 [3] :x  [1, 2, 3]]

This way, the information is wrapped in brackets, so it can be handled gracefully by editors in terms of indentation and structural editing. Also, potentially we may make columns readable using tagged literals.

Some things to take into account:

  • breaking long lines
  • taking missing values into account
  • respecting *print-length*

daslu avatar Feb 02 '21 17:02 daslu

Hi @ashimapanjwani, let us discuss this?

I imagine you may have some comments about this idea, following your recent exploration of other aspects of printing.

daslu avatar Feb 02 '21 17:02 daslu

I think there are two unresolved issues so far:

  1. How to handle missing values.
  2. How to handle columns with 1000+ items in them which are quite common.

cnuernber avatar Feb 02 '21 17:02 cnuernber

So, I think large columns we have to leave as is.

But I single line reader-safe representation is totally reasonable and in fact I now think that literally everything in the dataset/datatype system aside from tensors and datasets should have a single line reader safe representation. Thanks for being patient with this -- we will get there :-).

cnuernber avatar Jul 15 '21 21:07 cnuernber

Let's split this up. First is the change in format for existing columns. Reader literals are another issue altogether. I also think that columns should display some number of begin elements, and elipsis, then some number of end elements. It is just too useful to see the head/tail of the dataset quickly. The format of tmdjs is the above format.

So first step is to change the column print format to the above suggestion.

cnuernber avatar Jan 22 '22 14:01 cnuernber

This has stalled and no one has worked on it for a while. I do think the column print format could be better but I don't think this is a pressing issue at this time. We can reopen and when there is a PR that makes the column printing measurably better.

cnuernber avatar Jan 05 '23 13:01 cnuernber