hledger icon indicating copy to clipboard operation
hledger copied to clipboard

allow comma in tag values [.]

Open simonmichael opened this issue 11 years ago • 16 comments

Currently tag values can't contain a comma, as it's used to separate multiple tags on a line. We could disallow multiple tags per line, or allow commas within single/double-quotes (as in commodity symbols).

simonmichael avatar Sep 14 '14 15:09 simonmichael

We need to pick a design and either implement it or note the limitation in the manual.

simonmichael avatar Oct 10 '15 04:10 simonmichael

Why not use the semicolon (;) to string tags together, and free the comma to be allowed within a tag value. The semicolon is already used to split tags from posting/transaction, just reuse it to signal the start of the next tag.

cfellinger avatar Jun 09 '17 16:06 cfellinger

Wouldn't that leave us in the same situation, just with a different character ? Sometimes you want to write ; in your tags.

simonmichael avatar Jun 09 '17 17:06 simonmichael

On Fri, Jun 09, 2017 at 10:28:25AM -0700, Simon Michael wrote:

Wouldn't that leave us in the same situation, just with a different character?

Yes, but with a less likely used char perhaps. Actually, I though semicolon wasn't allowed anyway, but ..

Sometimes you want to write ; in your tags.

yes, and colon, and that one isn't allowed at all in a tag value, or is it?

But what about doubling the simecolon if its really ment to be there, just like the double space after the account name in a posting.

The same could be done with comma, but I would prefer the semicolon anyhow, less likely to be used in a tag and already used to start tags (okee, comments). On the other hand if doubling is an option, the comma would have the advantage of leaving current usage as is.

I'm sorry, but I haven't used hledger enough to have strong opinions, just venting alternatives. And for the record, I'm no fan of doubling, but I do like to be able to use the comma in extended comments/tags.

-- groetjes, carel

cfellinger avatar Jun 09 '17 17:06 cfellinger

yes, and colon, and that one isn't allowed at all in a tag value, or is it?

That's a good question; I think it is. We could probably use more tests in tests/journal/tags.test.

simonmichael avatar Jun 09 '17 18:06 simonmichael

We could just stop supporting multiple tags in one line. If you have multiple tags, use multiple comment lines. I rarely have multiple tags, myself. I know some people probably use tags much more than I do.

; tag1: a, b ; tag2: c

simonmichael avatar Jun 09 '17 18:06 simonmichael

Or we could do the quoting thing, which is consistent with commodity symbols. ("If the commodity contains numbers, spaces or non-word punctuation it must be enclosed in double quotes.")

   ; tag1: "a, b", tag2: c  # tag1's value contains comma, quotes needed

simonmichael avatar Jun 09 '17 18:06 simonmichael

Ledger only allows one tag per line. (With a value. It allows multiple valueless tags per line via :tag1:tag2: syntax).

If we restrict tags to one per line, we also need to decide whether the tag name must be the first thing in the comment line, or if it's allowed to have non-tag content before it, as we currently allow. So eg:

  ; this comment contains one tag, "tag1" (a word followed by colon) tag1: rest of line, is: tag1's; value

or the more restrictive (and easier for other implementors):

  ; tag1: tag's name must be at start of comment line

simonmichael avatar Jun 09 '17 21:06 simonmichael

I agree it makes sense to only allow one tag per line and only at the start of the comment line. If you have multiple tags in the same comment line, it becomes hard to see at a glance which part belongs to which tag. This reduces the overview. The same applies when there's a comment and a tag in the same comment line.

lukasbestle avatar Jan 08 '23 17:01 lukasbestle

I'd prefer the backwards-compatible quoting approach:

Or we could do the quoting thing, which is consistent with commodity symbols. ("If the commodity contains numbers, spaces or non-word punctuation it must be enclosed in double quotes.")

   ; tag1: "a, b", tag2: c  # tag1's value contains comma, quotes needed

Suddenly breaking journals that have many tags per line is not elegant.

nobodyinperson avatar Jan 08 '23 21:01 nobodyinperson

I see the appeal of a backwards-compatible solution, however there are two issues with this one in particular:

  • It means that quotes can no longer be used in a tag value, at least not at the beginning and end. To keep quote support, there would need to be a way to escape quotes (e.g. tag1: "a, b and \"c\"").
  • The quote syntax would be incompatible with ledger. Multiple tags per line separated by comma are already incompatible, but also a simple case like one tag on its own line that contains a comma in its value would require different syntax for ledger and hledger.

lukasbestle avatar Jan 08 '23 21:01 lukasbestle

Thanks for the new input on this.

Allowing only one tag per line, and requiring them to be at start of comment, is probably the simplest and most general. It would feel noisy and a bit unpleasant if you are using multiple tag names without values, having to put each word on a new line. (Especially if you are already used to writing these on one line.)

Allowing tag values to be enclosed in quotes seems lowest impact, but it would still break existing tags which contain double quotes within them. It looks a bit noisier and harder to parse visually. More so if you also support escaped quotes.

This is not causing me pain but I'm not against a change here, but since it would break people's journals, it requires some strong motivation to be worth the trouble. It requires gathering support and showing some kind of majority/consensus for it on the mail list, eg.

simonmichael avatar Jan 11 '23 04:01 simonmichael

Re: escaping quotes in quoted tag values:

How about supporting more than one quoting scheme, e.g. à la Python:

"a string"
'a string'
"""a string"""
'''a string'''

With this approach I think no escaping of quote characters should be necessary as in that case a different quoting scheme can be used. This would at least help reduce the complexity of implementing an escaping scheme, both for hledger and the user (escaping is always ugh...)

Still doesn't fix the 'my tag looks like this: ; myTag: "I like quotes"'-problem, though...

nobodyinperson avatar Jan 11 '23 08:01 nobodyinperson

Without escaping, what would happen if a tag value contains both single and double quotes? For some external data like bank transfer purpose strings, we cannot assume their format. So IMO there needs to be a standard way to transform an arbitrary string into a valid tag value (no matter which characters it contains).

A possible idea could be to use the CSV solution: A quote is escaped with two quotes. Also not pretty, but it's already commonly used.

lukasbestle avatar Jan 11 '23 11:01 lukasbestle

Without escaping, what would happen if a tag value contains both single and double quotes?

Both single and double quotes are not a problem with the Python-style quoting:

""" a string ' with single " and double quotes """
''' a string ' with single " and double quotes '''

The only problem is for comments containing both ''' and """ (or more quotes):

"""   a string with three double quotes (""") AND three single quotes (''') won't work without escaping """ # 👈 broken
'''   a string with three double quotes (""") AND three single quotes (''') won't work without escaping ''' # 👈 broken

I'd consider this a very niche problem. hledger's quoting logic for importing from external sources could be:

if comment contains '''+ and """+:
  insert a unicode 'empty character' (https://emptycharacter.com/) or between either or remove quotes
  use the opposite quoting scheme
if comment contains ''':
  use three double quotes """
if comment contains """:
  use three single quotes
if comment contains ':
  use double quotes "
if comment contains ":
  use single quotes '

The mangling of ''' and """ in one comment in IMO an acceptable solution to mitigate an implementation of escaping.

nobodyinperson avatar Jan 12 '23 09:01 nobodyinperson