DataFrame
DataFrame copied to clipboard
DataFrame pretty/ascii/markdown printer
I have the following printer for DataFrames which I can push to this repo if you want.
Given:
df := DataFrame
withRowNames: #(r1 r2)
columnNames: #(c1 c2 c3 c4).
You can print it to Transcript with:
DataFrameMarkdownPrinter new
stream: Transcript;
dataFrame: df;
write.
and you will get:
| # | c1 | c2 | c3 | c4 |
|----|----|----|----|----|
| r1 | 11 | 12 | 13 | 14 |
| r2 | 21 | 22 | 23 | 24 |
This table looks fine in monospaced fonts, but also you can copy+paste it to a markdown editor and will parse as a table.
Opinions?
Ah, to complete the example, the df contents were filled with:
1 to: df rowNames size do: [ :rowIndex |
1 to: df columnNames size do: [ :columnIndex |
df
at: rowIndex at: columnIndex
put: (rowIndex asString, columnIndex asString) ] ].
Something to improve might be that the cells contents are aligned to right on ASCII, but not for markdown:
| # | c1 | c2 | c3 | c4 |
|----|----|------|--------|----------|
| r1 | o | oo | ooo | oooo |
| r2 | oo | oooo | oooooo | oooooooo |
and rendered as:
| # | c1 | c2 | c3 | c4 |
|---|---|---|---|---|
| r1 | o | oo | ooo | oooo |
| r2 | oo | oooo | oooooo | oooooooo |
this can be fixed by placing ':'s on the header, but I didn't invest time to do it.
Very interesting! I have not seen the code yet, but did you used http://rosettacode.org/wiki/Align_columns#Smalltalk to maybe allow different align options?
Could this be used simply as:
df printString
@hernanmd I wasn't aware of this code, it's good to know it. I wrote it from scratch. This code is not nice with blocks but it has more options than mine.
I ported the code, as an exercise: https://gist.github.com/tinchodias/0f99be3cecbe3fc5ed93dea90c877fd5 But there are far too many block closures!
But this was the implementation I was talking about originally: https://github.com/tinchodias/FFICallLogger/blob/master/FFICallLogger-UI/TFLMarkdownTablePrinter.class.st
Thanks for sharing @tinchodias. Without checking your version I wrote some bits too :) My version is inspired by the Ruby implementation, however Pharo 8 and 9 lack some methods which could be really handy in the base library. For example #zip: method is used in the Ruby version, and #transposed for Collection of collections.
| text fieldsByRow maxSize colWidths array i |
Transcript clear.
text := 'Given$a$text$file$of$many$lines,$where$fields$within$a$line$
are$delineated$by$a$single$''dollar''$character,$write$a$program
that$aligns$each$column$of$fields$by$ensuring$that$words$in$each$
column$are$separated$by$at$least$one$space.
Further,$allow$for$each$word$in$a$column$to$be$either$left$
justified,$right$justified,$or$center$justified$within$its$column.'.
maxSize := ((fieldsByRow := text lines collect: [ : l | l findTokens: '$' ]) detectMax: #size) size.
fieldsByRow do: [ : row | row addAll: (Array new: maxSize - row size withAll: '') ].
" Transpose fieldsByRow "
array := Array2D rows: fieldsByRow anyOne size columns: fieldsByRow size.
1 to: fieldsByRow size do: [: column |
1 to: fieldsByRow anyOne size do: [: row |
array at: row at: column put: ((fieldsByRow at: column) at: row)]].
" Calculate max field width per column "
colWidths := ((1 to: array numberOfRows) collect: [ :r | (array atRow: r) collect: #size ]) collect: #max.
{ #padLeftTo: . #padRightTo: }
collect: [ : jSel |
((fieldsByRow collect: [ : row |
| gen |
gen := Generator on: [ : g | colWidths do: [ : k | g yield: k ] ].
row collect: [ : w | { w . gen next } ] ])
collect: [ : line | line collect: [ : pair | pair first perform: jSel with: pair second ] ]) ]
thenDo: [ : lineArray |
lineArray do: [ : e | Transcript show: (e joinUsing: ' '); cr ].
Transcript cr ]
Can you descrived transposed? and zip:?
@Ducasse the zip is similar in Python and Ruby. For the simple case of Collections with the same size It would be like building a new collection with associations:
{ 'China' . 'India' . 'Indonesia' }
with: { 'Virus' . 'Soda' . 'Rata' }
collect: [ :a :b | a -> b ]
"{'China'->'Virus'. 'India'->'Soda'. 'Indonesia'->'Rata'}"
but instead of specifying the block:
{ 'China' . 'India' . 'Indonesia' } zip: { 'Virus' . 'Soda' . 'Rata' }
And padding is needed when collection sizes are different.
I think it would be great to have those options in DataFrame, don’t hesitate to do a PR :)
This was already resolved by @Joshua-Dias-Barreto
He added methods toMarkdown and toHtml
@jecisc I think we can close this issue. Do you agree?