DataFrame DataFrame pretty/ascii/markdown printer

I have the following printer for DataFrames which I can push to this repo if you want.

Given:

df := DataFrame 
		withRowNames: #(r1 r2)
		columnNames: #(c1 c2 c3 c4).

You can print it to Transcript with:

DataFrameMarkdownPrinter new
			stream: Transcript;
			dataFrame: df;
			write.

and you will get:

| #  | c1 | c2 | c3 | c4 |
|----|----|----|----|----|
| r1 | 11 | 12 | 13 | 14 |
| r2 | 21 | 22 | 23 | 24 |

This table looks fine in monospaced fonts, but also you can copy+paste it to a markdown editor and will parse as a table.

Opinions?

Mar 29 '22 20:03 tinchodias

Ah, to complete the example, the df contents were filled with:

	1 to: df rowNames size do: [ :rowIndex |
		1 to: df columnNames size do: [ :columnIndex |
			df 
				at: rowIndex at: columnIndex 
				put: (rowIndex asString, columnIndex asString) ] ].

Mar 29 '22 20:03 tinchodias

Something to improve might be that the cells contents are aligned to right on ASCII, but not for markdown:

| #  | c1 |   c2 |     c3 |       c4 |
|----|----|------|--------|----------|
| r1 |  o |   oo |    ooo |     oooo |
| r2 | oo | oooo | oooooo | oooooooo |

and rendered as:

#	c1	c2	c3	c4
r1	o	oo	ooo	oooo
r2	oo	oooo	oooooo	oooooooo

this can be fixed by placing ':'s on the header, but I didn't invest time to do it.

Mar 29 '22 20:03 tinchodias

Very interesting! I have not seen the code yet, but did you used http://rosettacode.org/wiki/Align_columns#Smalltalk to maybe allow different align options?

Could this be used simply as:

df printString

Mar 29 '22 21:03 hernanmd

@hernanmd I wasn't aware of this code, it's good to know it. I wrote it from scratch. This code is not nice with blocks but it has more options than mine.

Mar 29 '22 22:03 tinchodias

I ported the code, as an exercise: https://gist.github.com/tinchodias/0f99be3cecbe3fc5ed93dea90c877fd5 But there are far too many block closures!

Mar 30 '22 01:03 tinchodias

Refactored in a small hierarchy of printers + tests in attached st

ColumnAlignedPrinter.st.zip

Mar 30 '22 04:03 tinchodias

But this was the implementation I was talking about originally: https://github.com/tinchodias/FFICallLogger/blob/master/FFICallLogger-UI/TFLMarkdownTablePrinter.class.st

Mar 30 '22 05:03 tinchodias

Thanks for sharing @tinchodias. Without checking your version I wrote some bits too :) My version is inspired by the Ruby implementation, however Pharo 8 and 9 lack some methods which could be really handy in the base library. For example #zip: method is used in the Ruby version, and #transposed for Collection of collections.

| text fieldsByRow maxSize colWidths array i |
Transcript clear.
text := 'Given$a$text$file$of$many$lines,$where$fields$within$a$line$
are$delineated$by$a$single$''dollar''$character,$write$a$program
that$aligns$each$column$of$fields$by$ensuring$that$words$in$each$
column$are$separated$by$at$least$one$space.
Further,$allow$for$each$word$in$a$column$to$be$either$left$
justified,$right$justified,$or$center$justified$within$its$column.'.

maxSize := ((fieldsByRow := text lines collect: [ : l | l findTokens: '$' ]) detectMax: #size) size.
fieldsByRow do: [ : row | row addAll: (Array new: maxSize - row size withAll: '') ].

" Transpose fieldsByRow "
array := Array2D rows: fieldsByRow anyOne size columns: fieldsByRow size.
1 to: fieldsByRow size do: [: column |
 1 to: fieldsByRow anyOne size do: [: row |
  array at: row at: column put: ((fieldsByRow at: column) at: row)]].

" Calculate max field width per column "
colWidths := ((1 to: array numberOfRows) collect: [ :r | (array atRow: r) collect: #size ]) collect: #max.
{ #padLeftTo: . #padRightTo: } 
 collect: [ : jSel |
  ((fieldsByRow collect: [ : row | 
  | gen |
  gen := Generator on: [ : g | colWidths do: [ : k | g yield: k ] ].
  row collect: [ : w | { w . gen next } ] ])
   collect: [ : line | line collect: [ : pair | pair first perform: jSel with: pair second ] ]) ]
 thenDo: [ : lineArray | 
  lineArray do: [ : e | Transcript show: (e joinUsing: ' '); cr ].
  Transcript cr ]

Mar 31 '22 03:03 hernanmd

Can you descrived transposed? and zip:?

Mar 31 '22 07:03 Ducasse

@Ducasse the zip is similar in Python and Ruby. For the simple case of Collections with the same size It would be like building a new collection with associations:

{ 'China' . 'India' . 'Indonesia' } 
	with: { 'Virus' . 'Soda' . 'Rata' }
	collect: [ :a :b | a -> b ]

 "{'China'->'Virus'. 'India'->'Soda'. 'Indonesia'->'Rata'}"

but instead of specifying the block:

{ 'China' . 'India' . 'Indonesia' } zip: { 'Virus' . 'Soda' . 'Rata' }

And padding is needed when collection sizes are different.

Mar 31 '22 21:03 hernanmd

I think it would be great to have those options in DataFrame, don’t hesitate to do a PR :)

Feb 14 '23 10:02 jecisc

This was already resolved by @Joshua-Dias-Barreto He added methods toMarkdown and toHtml

@jecisc I think we can close this issue. Do you agree?

Apr 23 '24 10:04 olekscode

DataFrame DataFrame copied to clipboard

DataFrame pretty/ascii/markdown printer

DataFrame
DataFrame copied to clipboard