hledger [WIP] tab-separated text as --output-format

[WIP] tab-separated text as --output-format

Open schoettl opened this issue 2 years ago • 27 comments

I'd like to add a new output format, especially for the register command: tab-separated text

Use-case: I often use hledger to generate register reports which must be formatted nicely. For that, I use a text processing software (e.g. LibreOffice). To get a nice format, I have to adapt the font size, the output width (-w) to match the correct page width and avoid line breaks. Also, only mono-space fonts are possible with the text format. To remove the account column, I have to apply an ugly sed script.

Suggestion: I suggest a tab-separated text format that puts tab characters between date, description, account, amount, total/average as column separater. The output could also directly pasted and used into table calculation software.

Alternatives considered: CSV output format would be a alternative for my use-case but it's much easier to do

hledger register -Otxt-tab-separated | xclip -i

… and then paste it directly into a LibreOffice Writer template.

Would maintainers accept such an PR?

Oct 16 '21 20:10 schoettl

We support reading csv, tsv and ssv, so I see no harm in adding (well-formed, standard-compliant) tsv as another output format. I would use that tsv output format name for consistency.

Are we ok with it being available only on a few commands ? Supporting it consistently on all commands is a big effort.

Oct 31 '21 17:10 simonmichael

That said, I bet there are utilities to easily convert csv to tsv.

Oct 31 '21 17:10 simonmichael

tsv sounds like csv with tabs and for that I would use a converter tool, indeed.

My desired output format is actually only the output of plain hledger register but with tabs instead of space-padding. I would not add any extra data columns like csv. Therefore I'm not sure if tsv is an expressive name for that.

I'm also good with adding that output format only to a few commands. It makes especially sense for register, activity, and maybe balance and more.

Nov 01 '21 08:11 schoettl

Here is a csv to tsv converter:

$ cat csv2tsv
#!/usr/bin/env python
import csv, sys
csv.writer(sys.stdout, dialect='excel-tab').writerows(csv.reader(sys.stdin))

I'm a bit hesitant to add a whole new formatting option (with accompanying alignment annoyances) for such a simple change. Can you take a look at the output of the following and tell me if it meets your needs?

$ hledger register --output-format=csv | csv2tsv

Another alternative is just to use sed, though I'm still not sure what the parameters you want for alignment are.

$ hledger register | sed -e "s/   */\t/g"

Nov 09 '21 00:11 Xitian9

Thanks for the snippets, @Xitian9. For me, converting csv to tsv is not the problem.

For me it would be very useful to have normal register output but with tabs instead of space-padded (see top post). I thought it might be useful for other people making printed/published reports with hledger, that's why I made this PR.

Nov 09 '21 09:11 schoettl

Edit: Sorry, didn't read carefully enough. Your second snippet actuall works quiet well for my sample register output

At least as long as:

there are not more than 3 spaces in the description and
the 3 spaces between the last two columns are guaranteed.
Also, I don't know how the spaces between the columns changes for longer descriptions and accounts.
Also, a tab character between the first two columns would be nice.

I'm not sure about these pre-conditions, that's why I think, a built-in solution might be better.

Here is a screenshot demonstrating the use-case:

grafik

Nov 09 '21 09:11 schoettl

The sed script only requires two spaces between each field, and that is guaranteed between all fields except between the date and the description. You are correct that this will choke when there are two or more consecutive spaces in a description.

Try this:

hledger register | sed -e "s/ /\t/" -e "s/   */\t/g"

I believe this addresses points 2 and 4. Point 1 remains an issue, though I'm not sure how often it would arise in practice. Point 3 is a general issue with tabs, and one of the reasons they are best avoided for actual text alignment: every piece of software handles them differently, and what works for one person will completely break another's workflow.

Nov 09 '21 10:11 Xitian9

By the way, have you tried hledger-web? It sounds like it may fill the need of what you're trying to do.

Nov 09 '21 10:11 Xitian9

By the way, have you tried hledger-web? It sounds like it may fill the need of what you're trying to do.

Thanks, but it doesn't provide the power of the command-line hledger:

I cannot easily generate custom reports by script/Makefile. I would have to copy paste search queries and results by hand, right?
I cannot use options like --invert which I need for some reports.
When copying from /register, it yields this, the total in the next line:

2021-11-08 	fcdc6aec 	ce:e0:3b7c5f45, 83:e0:3b7c5f45 	0
	0
2021-11-06 	c140d73e 	fa:53f9679b, ce:9ec9dcff 	0
	0

Nov 09 '21 11:11 schoettl

Thanks, I can probably use your second sed script when I have fixed occurrences of two consecutive spaces in the bank transaction titles.

I believe this addresses points 2 and 4. Point 1 remains an issue, though I'm not sure how often it would arise in practice. Point 3 is a general issue with tabs, and one of the reasons they are best avoided for actual text alignment: every piece of software handles them differently, and what works for one person will completely break another's workflow.

Aren't Point 1 and 3 a reason for implementing this right in hledger? I mean, all (GUI) text processing software knows the concept of tab stops. And text processing software is a good tool to use when making final reports. I use it for a few years now, for a 20-page report for a classic music festival.

Adressing specifically Point 3, my idea was to re-interprete the -w n,m option for -O tsv as description-max-width = n and account-max-width = m.

Nov 09 '21 11:11 schoettl

I agree with you overall, but I just can't shake the feeling that this is not quite the right way to go about it.

The way I see it, we have the following output types:

plain text: for quick-and-dirty human-readable output and don't care about fancy formatting
csv: as a data exchange format with other tools
json: for serialising
sql: for sql

My understanding is that you want tab-separated report output so that you can copy-paste it into libreoffice documents, and then format it nicely for print reports. This seems more like a workaround for a lack of a proper templating engine than an actual use case in itself. I just feel that there's a better solution lurking around here somewhere, and focussing on finding that will give better results than adding a new output format.

Nov 09 '21 11:11 Xitian9

There might be something useful in here: https://unix.stackexchange.com/questions/170199/is-there-a-standalone-tool-which-will-write-reports-from-csv-data-files

Nov 09 '21 11:11 Xitian9

I'm a little unsure as well, whether the proposed format is generally useful enough and generalizable enough to be worth adding as a baked in format. It might be, I just don't have enough experience with your use case to say yet..

Nov 09 '21 17:11 simonmichael

@Xitian9 Regarding other tooling / template software: I already considered to use org-mode with babel code blocks and plain hledger output, and generate PDF reports via org-mode LaTeX export. But I find report formatting very cumbersome in LaTeX.

My impression is that a specialized report generation software for hledger is actually missing.

Thanks for the link, but that tool only converts one CSV file to one ODF. My report contains many different hledger reports.

I understand that both of you are hesitating. I just want to summerize the current options to make reports:

plain text: for quick-and-dirty human-readable output and don't care about fancy formatting => not sufficient for proper reports. There are work-arounds (e.g. sed) that fail in some cases
csv: not very useful for making plain paper reports. Too many steps to get to reports: selecting columns, change separator. Too much programming/scripting or manual steps for amateurs.
json: not at all useful for amateurs making reports
sql: not at all useful for amateurs making reports
hledger-web: limited: cannot do everything what hledger can do; cannot copy proper tab-separated register output to clipboard (see above)

So, how do people make real reports with hledger?

"Freaks" like us may do with org-mode, LaTeX, plain-text…

But shouldn't hledger also aim at users that make reports with normal text processing software like MS Word and LibreOffice? And then, how should they make reports? So far, I have no good way. Only work-arounds like replacing spaces with tabs using custom sed scripts or replacing 20 spaces with nothing to shorten the line, and fiddling around with the -w n,m option. I think, a tab-separated text output could perfectly fit the gap.

But please let me know, how do you produce actual printable reports?

Nov 09 '21 19:11 schoettl

Great question, which I agree we want better answers for. It probably deserves its own issue, mail list / chat room discussion, and web page (maybe this one). I'll just add pandoc to that list, here's a related example (in essence: a script collects hledger data values, plugs them into a markdown template with envsubst, and pandoc renders pretty documents).

Nov 09 '21 19:11 simonmichael

PS don't forget html output.

Nov 09 '21 19:11 simonmichael

PS don't forget html output.

(but not for register and balance)

Nov 09 '21 19:11 schoettl

Wow, you're right, we really should support these output formats more consistently.

Nov 09 '21 20:11 simonmichael

Yes, maybe pasting an HTML table into LibreOffice might be an alternative to the current solution.

Nov 09 '21 20:11 schoettl

I've written a output filter script:

hledger-output-filter - Transform hledger's register and print output.

usage: hledger-output-filter [options]

options:
  -t tab-separated instead of space-separated; useful for use in text
     processing software with tabstops.
     requirement: descriptions must not contain two consecutive whitespace!
  -c omit all comments
  -d DESCRIPTION_WIDTH
     shorten description to n characters.
     requires -t.
  -a ACCOUNT_WIDTH
     shorten account name to n characters.
     requires -t.
  -s a single date, i.e. "2021-11-19" instead of "2021-11-19=2021-11-20".
  -h print help message.

https://github.com/schoettl/hledger-contrib/blob/master/hledger-output-filter.sh

I find it useful but it's a hack. I had to fix dozens of double whitespace in my descriptions.

Nov 19 '21 21:11 schoettl

Nice! I often do the same to prototype a feature. On the upside, you cleaned up your data a bit..

Nov 21 '21 02:11 simonmichael

The next level of robustness would be a hledger script or two, which can use hledger's parsers, and mimic the code in Register.hs/Print.hs. (hledger-register-pretty-tsv.hs, hledger-print-pretty-tsv or some such..)

Nov 21 '21 02:11 simonmichael

(I'll try out your script. PS, nice https://github.com/schoettl/hledger-contrib repo, we should link it somewhere.)

Nov 21 '21 02:11 simonmichael

That bash script looks very nice. I have been writing many recently in almost exactly that style, but I see some new things to learn from yours.

I'm not seeing much effect though. I don't notice any change to register or print output, eg from hledger -f examples/sample.journal print | hledger-output-filter.sh. Adding -t gives

line 50: declare: -g: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]

because I'm on a mac, probably.

Nov 21 '21 02:11 simonmichael

PPS:

  -s a single date, i.e. "2021-11-19" instead of "2021-11-19=2021-11-20".

Yet another case of secondary dates getting in the way. I'm starting to really detest this feature!

Nov 21 '21 02:11 simonmichael

I'm not seeing much effect though. I don't notice any change to register or print output, eg from hledger -f examples/sample.journal print | hledger-output-filter.sh

Right, without options, the filter is id and doesn't change anything.

line 50: declare: -g: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]

declare -gr var=val would be a global read-only variable but Mac has such an old version of Bash that this doesn't work. Maybe global instead of declare would work on Mac.

Yet another case of secondary dates getting in the way. I'm starting to really detest this feature!

Yes, I guess mostly the secondary date is not needed in register output. But in principle, it's a valuable information.

The next level of robustness would be a hledger script or two, which can use hledger's parsers, and mimic the code in Register.hs/Print.hs.

But hledger doesn't has a parser for register output, has it?

Nov 21 '21 08:11 schoettl

But hledger doesn't has a parser for register output, has it?

No, I meant the script can use hledger-lib to parse files just as the builtin commands do.

Nov 21 '21 08:11 simonmichael

A useful discussion, that did not result in a PR; closing.

Apr 06 '23 05:04 simonmichael

hledger hledger copied to clipboard

[WIP] tab-separated text as --output-format

So, how do people make real reports with hledger?

hledger
hledger copied to clipboard