mito icon indicating copy to clipboard operation
mito copied to clipboard

Add support for & in formulas

Open marthacryan opened this issue 2 years ago • 1 comments

So, we want to concat with &.

  • There are two options:
  1. Convert the & to a + sign, and hope things work
  2. Convert this to a function call to the CONCAT funciton

(1) is the easiest (by far, it's literally a find and replace). But (2) is the only way to match excel's behavior (assuming CONCAT is right).

Note that the matching of behavior can be see in the example =1 & "a" in excel.

(2) is challenging because it requires reasoning about what the _arguements to CONCAT should be. Some examples:

  • =1 & "a" -> CONCAT(1, "a")
  • =1 & "a" & "b" -> CONCAT(1, "a", "b") or CONCAT(1, CONCAT("a", "b"))
  • =SUM(1, 2) & "a" -> CONCAT(SUM(1, 2), "a")
  • =1 & "a" & SUM(1, 2) -> CONCAT(1, "a", SUM(1, 2))

The most production-ready algorithm is some recursive tree parsing bs... IDK what this is off the top of my head.

One observation is that CONCAT is associative, so we can just concat in the order we find them -- we can just do one item at a time and then concat the result with the next item (e.g if there are two CONCATs

But this still requires reasoning about like "idenfities" and "terms" in the formula (and doing so in an iterative or recursive manner) -- this is more infrastructure than we want to build currently.

If I were to do this, I would:

  1. Finish identifying all terms in the formula (like we do with matches on column headers currently)
  2. This would include: operators and constants (I think this is all that's missing)
  3. Then with the list of "tagged matches" I would turn this into a basic AST
  4. Then, I would change the parser to use this AST -- while making all the tests still run
  5. Then, I would just compile the & operator to a CONCAT function call -- nothing different here

Notably, the AST gives us a lot of other good things. It's the right data structure for this stuff.

For approach (1), if we try casting the values on either side of + to strings -- this still requires us to reason about "what is on the other side". So it doesn't really make things easier.

We could just do (1) and have it only work for strings -- and then try and improve the error messages?

Taken from our retreat pair programming session.

marthacryan avatar Jan 31 '24 17:01 marthacryan

From Nate -- I think it's time for a proper AST. We have enough testing / good infrastructure that we will appreciate what we get out of this. Also, it's probably not that much work - we have almost everything except constants and operators.

marthacryan avatar Jan 31 '24 17:01 marthacryan