Power-Fx icon indicating copy to clipboard operation
Power-Fx copied to clipboard

Lift Constant on Regex Parameter to IsMatch

Open jack-work opened this issue 2 years ago • 8 comments

The IsMatch function does not accept values that are not constants at runtime. There is no reason for this to be the case. Unlike Match, the contents of the regex value for IsMatch do not augment the return type for the function. IsMatch always returns a true/false value. This PR lifts the restriction.

jack-work avatar Jul 20 '23 04:07 jack-work

We should discuss. There are lots of good reasons for restricting this to a constant: a) we can ensure consistency across backends by parsing the RE in the compiler and b) we can provide compile time feedback on RE correctness. I would also like to hear the compelling scenarios that require this. I know other languages do it, but I've been working with REs for decades and the few times I've resorted to this it has made a mess that was impossible to maintain. REs are hard enough without making them dynamic.

gregli-msft avatar Jul 21 '23 01:07 gregli-msft

Here is an example or where my team has been wanting this functionality. We have Power App components where we validate input using regular expressions. Sometimes we want to make the regular expression conditional so that we can more easily re-use the component for various scenarios. E.g. an email address component where sometimes (depending on values in other fields in the app) we want users to enter an internal email address and sometimes allow external email addresses. Another example is our client identifiers, sometimes they need to be 4 characters and sometimes 8 characters. It does not seem unreasonable to us for the validation expression to allow us to use conditions for IsMatch to ensure data integrity based on different scenarios in the same application.

RayLally avatar Sep 21 '23 17:09 RayLally

FYI, a recent release has caused all of our applications to start throwing "Regular expression must be a constant value." errors again - since this error keeps getting re-introduced it may be in everyone's best interests to review this option again or figure out some other way to make a permanent fix for this problem.

RayLally avatar Apr 16 '25 13:04 RayLally

Totally hear you. But, we're kind of headed in the opposite direction. In order to make regular expressions in Power Fx behave the same across C#, JavaScript, and PCRE2 (as a reference, and what Excel uses), we've introduced a compile time check that limits the language to an extended subset of any one of these implementations. The compiler is the single consistent choke point across all Power Fx implementations and it would be difficult to reimplement and maintain consistency if we were to do this in other languages like JavaScript too.

That said, I can imagine offering a dynamic facility for simple regular expressions that could be validated at runtime, for example with no capture groups. Can you provide some examples of the dynamic regular expressions you'd like to use?

gregli-msft avatar Apr 16 '25 16:04 gregli-msft

Thanks for the information, Greg! I really appreciate it. We created a component library where we use Regex expressions for validating the inputs. For example, we have a component called "CompTextInput" that has a Text Input control, a Label control, and a custom property called "ValidationExpression" that is set in the Power App based on what we want to allow in the Text Input. If it fails the validation check (If(!IsMatch(Trim(TxtInputGeneric.Text), CompTextInput.ValidationExpression)), the Label is displayed with a message to the user about whey the input is invalid. We use this same component for almost all text inputs for all of our Power Apps, we just change the regex passed into the ValidationExpression property (checking for whole numbers, integers, email addresses, special characters, etc. whatever the requirements are for that field in that Power App). This functionality has been working mostly as expected for years but it broke in July of 2023 and again this week (see the below cases).

TrackingID#2306160040007655 TrackingID#2504160040006871

I am sure I have done similar types of validation using JavaScript and "Contains" and "IsMatch" in the below program work fine using a variable without any errors in C# so I am confused about why this keeps breaking in Power-Fx.

using System;

namespace ConsoleApp2 { class Program { static void Main(string[] args) { var foo = "bar"; if(!System.Environment.MachineName.Contains(foo)) { Console.WriteLine(System.Environment.MachineName); } Console.ReadLine(); if (!System.Text.RegularExpressions.Regex.IsMatch(System.Environment.MachineName, foo, System.Text.RegularExpressions.RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(500))) { Console.WriteLine(System.Environment.MachineName); } Console.ReadLine(); } } }

RayLally avatar Apr 16 '25 17:04 RayLally

Thanks! Yes, JavaScript and C#, and most other languages, can handle dynamic regular expressions. Power Fx has an additional problem: we'd like to have the same semantics no matter which implementation is used. Besides being consistent and one language to learn, this allows us to, for example, move validation business logic written in a Dataverse function into Power Apps model driven apps for low latency client execution and running offline.

For example, if we didn't do anything, Match( "ab", "(|.)*" ) would return different results on C# and JavaScript. Take a look through the tests at https://github.com/microsoft/Power-Fx/blob/main/src/tests/Microsoft.PowerFx.Core.Tests.Shared/ExpressionTestCases/MatchFunctions_CaptureQuant.txt and you'll see ".net != node" comments and I didn't comment them all. This is what we're trying to avoid with the compile time check. We could also block ar runtime for dynamic regular expressions, but that would require implementing the compile time check in JavaScript, and having two sets of code would get out of sync.

But, all that said, perhaps we can reduce the dynamic surface area to something we could validate at runtime. Perhaps we ban groups with zero quantifiers and don't support capture groups at all. That's why I'd like to see your actual regular expressions, as much as you are willing to share, so we can get a sense of how much power we need to expose for your scenarios. I can imagine that most validation scenarios, except perhaps email, could be handled without so much power.

gregli-msft avatar Apr 16 '25 18:04 gregli-msft

Below are some of our most common regular expressions - these are more-or-less constants in that they are set once and don't change in the application itself, but the component currently throws an error because it considers the property value itself as not being a constant.

That said, there have been times when we do pass in a variable to the property when it depends on other values in the application. For example, certain types of clients have 4 character ID numbers while all others have 8 characters - so the validation expression property is set to something like this or a variable containing the appropriate expression would be passed into the property: If(CompToggle.Output=true, "^\w{4}$", "^\w{8}$")

Here are the most common ones we use:

No special characters: ^[\w\s.]+$

Phone number: (+\d{1,3}\s?)?(((\d{3})\s?)|(\d{3})(\s|-?))(\d{3}(\s|-?))(\d{4})(\s?(([E|e]xt[:|.|]?)|x|X)(\s?\d+))?

Valid email address: ^([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$

Legal (IRS) Entity Name: ^[A-Za-z0-9-&'_-\s]*$

9 digit Federal ID Number with no dash: ^\d{9}$

8 Character alphanumeric ID: ^\w{8}$

Amount with 2 decimals: ^[0-9]*(.[0-9]{0,2})?$

RayLally avatar Apr 16 '25 19:04 RayLally

Thank you for the examples! That is super helpful.

I've got some ideas about how we might address your scenarios that I'll run down with the team in the weeks ahead. If we do, it we do something, it probably won't happen until we add regular expressions for Power Fx V1 in Canvas apps which hasn't been done yet. We'll let you know.

We have returned Canvas apps to the logic they previously had for IsMatch and constant detection. As of version 3.25042.4 or later your apps should work as they previously did.

To be clear, we never intended for this to work. :) It doesn't work in Match and MatchAll, because it calls TryGetConstantValue instead of just checking IsConstant on the argument, something you'd expect to be consistent. The documentation for all three functions states that it needs to be a constant. You've exposed a bug in how we deal with constants, in particular in component properties, that we'll also be looking at.

A few notes on your REs:

  • The . in ^[0-9]*(.[0-9]{0,2})?$ and a few of your other REs will match any character (except a newline), not just a . which I think was intended.
  • The | in [E|e] will match the |, the alternation operator isn't needed within a character class. In Power Fx V1, and modern JavaScript, this is disallowed and the pipe needs to be escaped, if that is what is intended.
  • Using a '+' at the front of a group isn't supported by Power Fx V1 regular expression, you'll need to escape it with \+. I'm not sure how this is working for you now, as JavaScript doesn't support this. Some regular expression dialects do support it, but there has been a push by JavaScript and now by Power Fx to remove some ambiguity in the language.
  • I'm not sure what the _-\s at the end of ^[A-Za-z0-9-&'_-\s]*$ will do. With JavaScript u or v modifiers (modern JavaScript), this is disallowed, since \s is a class of characters and it doesn't make sense to have it as the end of a range. Without these modifiers, which is how Canvas runs today, it is accepted. Power Fx V1 also disallows it.

gregli-msft avatar Apr 18 '25 21:04 gregli-msft