hledger 1.40 renamed total column in csv export
On 1.34 exporting the results of the balance command to CSV has a total column:
$ hledger -f all.journal bal -M '^income:.*:salary' --invert --transpose -O csv -e today | head -n 1
"account","income:uber:salary","total"
On 1.40 running the same command results in:
$ hledger -f all.journal bal -M '^income:.*:salary' --invert --transpose -O csv -e today | head -n 1
"account","income:uber:salary","Total:"
The last column changed from total to Total:. This seems like a regression.
Hi @zmanji, I believe it was intentional, part of a number of cleanups and consistency improvements for tabular reports across different output formats. Why does it seem like a regression ?
I had many scripts processing the csvs for graphing purposes. Renaming the column broke scripts that were accessing the 'total' column. If this is intentional, then this is fine by me, feel free to close this.
Sorry about that.
I tend to favour simple lowercase to start with, but usually over time with more real-world users, capitalisation and punctuation tends to win. (Probably the "account" heading here should be capitalised also. The colon might be debatable for CSV.)
That's possibly my fault. I will look into this. The totalRowHeading in Commands.Balance all use upper case Total. Shall I adapt account accordingly?
hledger/test/balance/layout.test use account and Total: with these cases. So the output seems to be intended.
Maybe we can adapt capitalizing to the style of the account names? Say, if a majority of account names start with upper-case, then "account" and "total", should do so as well.
On Tue, 15 Oct 2024, thielema wrote:
That's possibly my fault. I will look into this.
The change came with commit 574115e00157ed98dee32dd657a54b558d517e06.
This maybe isn't top priority, but when we are tweaking headings and need a policy, I would probably
- Capitalise first words consistently (not necessarily all words in a multi-word heading)
- Consider using a colon as seems best for each case. That's more of a presentation detail so maybe colons make less sense in CSV ?
I think it's better to have a simple fixed rule rather than adapt headings to data.
My vote would be for:
-
Capitalization across all headings in viewing formats (like text and HTML).
- This looks more professional in reports which are to be read, especially if they are to be read by others (e.g., accountants).
- I'd be okay with capitalization in the CSV as well, but I realize that some (including myself) prefer lowercase headings in CSVs, and snake-case (e.g.,
account_name) is a convention / best practice for CSVs in many fields. So I'd prefer using lowercase in data formats like CSV (and possibly JSON).
-
Removing
:in headings likeTotal:in all reports, regardless of whether it's a viewing format or a data format.
This sounds good to me also, except when people are dragging CSV into a spreadsheet wouldn't they like to see the same presentation-ready capitalised headings that they'd see in text or html output ?
(The FODS format is more specialised for that use case, but works only for Open Office/Libre Office users.)
when people are dragging CSV into a spreadsheet wouldn't they like to see the same presentation-ready capitalised headings
Given that CSV can be used both for sharing with accountants (meaning it needs to look professional) as well as for data processing, it falls in the middle.
However, given that all spreadsheet software have "Format > Case > Title Case" as an option, I don't think it should matter very much.
The same can also be said about sed -i '1s/.*/\L&/g file.csv to change the column names to lowercase.
The question is which is a better default. I'd suggest that lowercase is the better default. I find that when I want to share a CSV, I tend to do a bit of cleaning up before sharing it with others. When doing that, changing the column names to title case would be part of the cleaning up. I don't expect the CSV that is output by hledger to be shareable without some cleaning up.
Given that, I think considering CSV as a data processing format first and as a presentation format second would be the right order. But I think it would be fine either way given the ease of "convert to title case" or a simple sed command.
OP's report is an example of where someone's data processing was disrupted:
I had many scripts processing the csvs for graphing purposes. Renaming the column broke scripts that were accessing the 'total' column.
With this in mind I would agree with @the-solipsist that ideally we should omit colons and other presentation punctuation from "data" formats like CSV. And possibly I would omit capitalisation as well. Except possibly if it's too much of a headache to implement and maintain these variations and keep them consistent. I don't recall if that would be troublesome, @thielema might have thoughts.
(Yes I'm flip-flopping a bit between the presentation and data processing use cases.)
Hi, coming back to this -
Since the 1.40 release notes mentioned this ("In balance commands' html and csv output, "Total:" and "Net:" headings are now capitalised consistently."), I think this is not a regression, and more of a wish than a bug; I have labelled it so.
I guess our primary wish is for a consistent policy for table column headings (and maybe other kinds of report headings), that devs can follow and users can expect.
At this point we'd need to review all reports and output formats to clarify the current status. Then we could revisit the open questions:
- a. no capitalisation, b. first word capitalised, or c. all words capitalised ?
- a. no colon, or b. colon ?
- a. same heading text style across all output formats, or b. different for "machine oriented" vs "human oriented" formats ? (related: https://hledger.org/1.50/hledger.html#amount-parseability)
- a. changes in heading text are a breaking change, or b. heading text is cosmetic and not a stable API ?