sh icon indicating copy to clipboard operation
sh copied to clipboard

expand: support base#number arithmetic notation

Open rob-myers opened this issue 6 years ago • 7 comments

gosh doesn't support it e.g. echo $(( 2#1010 )) outputs 0 instead of 10.

Could it be parsed as an arithmetic binary operator for LangBash?

rob-myers avatar Dec 18 '18 09:12 rob-myers

The interpreter doesn't support this kind of number, you're right. That's definitely a bug. We can fix that without changing the syntax package.

A year and a half ago, when I first came across this numerical notation in bash and mksh, I simply modified the parser to allow # characters within numbers. But that's not a great solution. It's lazy parsing, leaving work to be done by the user, and also accepts incorrect expressions like foo#123 or 1#2#3.

I think the parser should definitely parse this as a node somehow. We already abuse BinaryArithm for comma operators, so using them here seems fine too.

mvdan avatar Dec 19 '18 17:12 mvdan

I'm actually re-evaluating this right now. From man bash:

Constants with a leading 0 are interpreted as octal numbers. A leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base. If base# is omitted, then base 10 is used. When specifying n, the digits greater than 9 are represented by the lowercase letters, the uppercase letters, @, and _, in that order. If base is less than or equal to 36, lowercase and uppercase letters may be used interchangeably to represent numbers between 10 and 35.

So should the syntax nodes also tell you about 0xNNN and 0NNN? If we support BASE#NNN, seems to me like we should be consistent.

I'm starting to think that, as far as the nodes go, we can keep on leaving them as literals. Then, an interpreter or analyzer can simply have logic like:

if contains(lit, "#") {
    // base#number
} else if hasprefix(lit, "0x") {
    // 0xNNN
} else if hasprefix(lit, "0") {
    // 0NNN
} else {
    // NNN
}

Of course, the expand package would implement this, so one could reuse it too.

But of course, we'd still have to fix the parser to error on invalid input like 3#2#1, and the interpreter to expand these number literals properly.

This is similar to how the syntax tree in Go itself handles literals - see https://golang.org/pkg/go/ast/#BasicLit. The string there can take one of many standard forms, and then one can easily "evaluate" that number via a public API.

mvdan avatar Jan 05 '19 22:01 mvdan

Fair enough.

There is a difference i.e. base#nnn takes two arguments whereas all other examples do not, including those in BasicLit. This additional argument could contain a parameter (or arithmetic) expansion i.e. one may want to select the base dynamically.

rob-myers avatar Jan 06 '19 18:01 rob-myers

This additional argument could contain a parameter (or arithmetic) expansion i.e. one may want to select the base dynamically.

I presume you mean input like $(print-base)#1234. Does it matter, though? The logic I described above is once all regular shell expansions have happened. So it would go like this:

$(print-base)#1234 -> 8#1234 (expansion) 8#1234 -> 668 (arithmetic evaluation)

For now, I'll just make the parser reject invalid numbers like 0xxx and 3#2#1. I might change my mind about adding a new node; for now, I'm fairly sure that it's not worth it.

Edit: And I also need to make the expand package (and thus gosh) interpret those numbers properly.

mvdan avatar Jan 07 '19 17:01 mvdan

@mvdan I've built an interpreter on top of your parser and implemented all your operations. In keeping with your static approach, I only use your parser in the following situations:

  • Interactively i.e. when reading from tty.
  • Running or sourcing a script, or non-interactive bash.
  • In eval.
  • In exec.

I'm trying to avoid possibly parsing inside every literal. In fairness my use-case is very specialised, but I thought I'd try to explain why I'm pestering you.

rob-myers avatar Jan 08 '19 12:01 rob-myers

You're not pestering me :)

My point above is that interpreters need to do some arithmetic expression parsing one way or another. For example, if you consider my $(print-base)#1234 example above, the static single-pass parser can't do anything useful at all. Or imagine $(echo 8#1234) as an arithmetic expression.

It's only after all the expansions that the interpreter has to parse the resulting string as a number. This is why I pasted the if contains(lit, "#") ... example in my earlier comment.

mvdan avatar Jan 19 '19 22:01 mvdan

This won't change the node types in the syntax package, so I'm pushing this off the 3.0 release for now. That release should have been out a while ago.

mvdan avatar May 18 '19 21:05 mvdan