CSharpVerbalExpressions icon indicating copy to clipboard operation
CSharpVerbalExpressions copied to clipboard

VerbalExpression Performance Question

Open ophirmi opened this issue 11 years ago • 13 comments

Hi all, I ran the following test to check how well verbal expression performs compared to a regular expression:

    [Test]
    public void TestingIfWeHaveAValidURL()
    {
        var testMe = "https://www.google.com";

        var swVerb = new Stopwatch();
        swVerb.Start();
        verbEx = VerbalExpressions.DefaultExpression
                    .StartOfLine()
                    .Then("http")
                    .Maybe("s")
                    .Then("://")
                    .Maybe("www.")
                    .AnythingBut(" ")
                    .EndOfLine();


        Assert.IsTrue(verbEx.Test(testMe), "The URL is incorrect");
        swVerb.Stop();

        var swRegex = new Stopwatch();
        swRegex.Start();
        var regex = new Regex( @"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$" );
        Assert.IsTrue( regex.IsMatch( testMe ) );
        swRegex.Stop();
        //Verb: 133 ms Regex: 4 ms
        Console.WriteLine("Verb: {0}   Regex: {1}", swVerb.ElapsedMilliseconds, swRegex.ElapsedMilliseconds);
    }

I ran it a couple of times and verbal expression runs at about 130 milliseconds, while regular expression runs in about 5 ms.

Same results returned in other tests I did. I'm considering using the verbal expression in my indexing project, so this time gap is too big. What do you think?

Thanks

ophirmi avatar Aug 20 '13 14:08 ophirmi

I'm guessing the extra time is just due to the chaining of the methods calling individual expressions themselves instead of just one big expression like in the second test. Would be interesting to hear other thoughts, though.

jwood803 avatar Aug 20 '13 14:08 jwood803

Changing to the following test brings almost the same results:

[Test] public void TestingIfWeHaveAValidURL() { var testMe = "https://www.google.com";

        var swVerb = new Stopwatch();
        verbEx = VerbalExpressions.DefaultExpression
                    .StartOfLine()
                    .Then("http")
                    .Maybe("s")
                    .Then("://")
                    .Maybe("www.")
                    .AnythingBut(" ")
                    .EndOfLine();

        swVerb.Start();
        Assert.IsTrue(verbEx.Test(testMe), "The URL is incorrect");
        swVerb.Stop();

        var swRegex = new Stopwatch();
        swRegex.Start();
        var regex = new Regex( @"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$" );
        Assert.IsTrue( regex.IsMatch( testMe ) );
        swRegex.Stop();
        //Verb: 133 ms Regex: 4 ms
        Console.WriteLine("Verb: {0}   Regex: {1}", swVerb.ElapsedMilliseconds, swRegex.ElapsedMilliseconds);
    }

As I understand it the chaining gets translated to the same regular expression here:

    public bool IsMatch(string toTest)
    {
        return PatternRegex.IsMatch(toTest);
    }

So why the big time difference? Thanks :)

ophirmi avatar Aug 20 '13 15:08 ophirmi

Perhaps because it still has to chain it all together before it can get to the IsMatch() method that's taking the extra time?

I might run it under the profiler when I get a chance and see if I can find anything.

jwood803 avatar Aug 20 '13 15:08 jwood803

If i understand correctly , you are timing the amount of time it takes for a verbex to initialize and to be asserted versus a regex and i dont see the point here.

If i were to benchmark performance for verbex vs regex then i would have them already initialized and run agains a very large search input ( a large text file) a couple of hundreds of times , and then obtain an average and compare the both.

How does this sound?

alexpeta avatar Aug 20 '13 15:08 alexpeta

Same big gap (135ms to 1ms) in the following example:

    [Test]
    public void Then_VerbalExpressionsEmail_DoesMatchEmail()
    {
        verbEx.StartOfLine().Then(CommonRegex.Email);

        var swVer = new Stopwatch();            
        swVer.Start();
        var isMatchVer = verbEx.IsMatch("[email protected]");
        Assert.IsTrue(isMatchVer, "Should match email address");
        swVer.Stop();

        Regex regex = verbEx.ToRegex();
        var swRegex = new Stopwatch();
        swRegex.Start();
        var isMatch = regex.IsMatch("[email protected]");
        Assert.IsTrue(isMatch, "Should match email address");
        swRegex.Stop();
        //Ver: 121 ms,    Regex: 0 ms  
        Console.Write( "Ver: {0}   Regex: {1}", swVer.ElapsedMilliseconds, swRegex.ElapsedMilliseconds );
    }

And here it's the same regular expression.

ophirmi avatar Aug 20 '13 15:08 ophirmi

alexpeta: I'll try that, thanks.

ophirmi avatar Aug 20 '13 15:08 ophirmi

It's just the initialization of Regex the first time. Try doing the regex timing first and then the verbex and you get exactly the opposite result:

    [Test]
    public void TestingIfWeHaveAValidURLTiming()
    {
        var testMe = "https://www.google.com";

        var swRegex = new Stopwatch();
        swRegex.Start();
        var regex = new Regex(@"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$");
        Assert.IsTrue(regex.IsMatch(testMe));
        swRegex.Stop();

        var swVerb = new Stopwatch();
        verbEx =
            VerbalExpressions.DefaultExpression.StartOfLine()
                             .Then("http")
                             .Maybe("s")
                             .Then("://")
                             .Maybe("www.")
                             .AnythingBut(" ")
                             .EndOfLine();

        swVerb.Start();
        Assert.IsTrue(verbEx.Test(testMe), "The URL is incorrect");
        swVerb.Stop();

        //Verb: 12   Regex: 161
        Console.WriteLine("Verb: {0}   Regex: {1}", swVerb.ElapsedMilliseconds, swRegex.ElapsedMilliseconds);
    }

psoholt avatar Aug 20 '13 15:08 psoholt

Hi, I built this simple console application to test regex vs verbal expressions run time, when going over a file of 300,000 urls and testing each url by regex or verbal expression:

private const string urlListFilePath = @"Data\urlList.txt"; private const int testReturnTimes = 1; private static string[] urls;

    static void Main(string[] args)
    {
        urls = File.ReadAllLines(urlListFilePath);

        long totalTime = 0;
        for (int j = 0; j < testReturnTimes; j++)
        {
            totalTime += TestRegex();
        }
        //10 times test - 260 ms
        //20 times test - 250 ms
        long avgTimeRegex = totalTime / testReturnTimes;

        totalTime = 0;
        for(int i=0;i<testReturnTimes;i++)
        {
            totalTime += TestVer();
        }
        //10 times test - 4000 ms
        //20 times test -3900 ms
        long avgTimeVer = totalTime / testReturnTimes;
    }

    public static long TestVer()
    {
        var verbEx = VerbalExpressions.DefaultExpression
                    .StartOfLine()
                    .Then("http")
                    .Maybe("s")
                    .Then("://")
                    .Maybe("www.")
                    .AnythingBut(" ")
                    .EndOfLine();

        int urlsCount = 0;
        var swVerb = new Stopwatch();

        swVerb.Start();
        foreach (var url in urls)
        {
            if (verbEx.Test(url)) urlsCount++;
        }
        swVerb.Stop();

        return swVerb.ElapsedMilliseconds;
    }

    public static long TestRegex()
    {
        var regex = new Regex( @"^(http)(s)?(://)(www\.)?([^\ ]*)$", RegexOptions.Multiline );
        int urlsCount = 0;
        var swRegex = new Stopwatch();

        swRegex.Start();
        foreach (var url in urls)
        {
            if (regex.IsMatch(url)) urlsCount++;
        }
        swRegex.Stop();

        return swRegex.ElapsedMilliseconds;
    }

Tested average of 10 test runs and 20 and results are the same: regular expression takes about 250 ms while verbal expression takes 4000 ms.

What do you thinks? Thanks

ophirmi avatar Aug 21 '13 13:08 ophirmi

Just to make the former post clearer, regular expression average run takes about 250 ms per run on 300K urls, while verbal expression takes about 4000 ms per run. Same results for 100 runs.

ophirmi avatar Aug 21 '13 15:08 ophirmi

I totally forgot to profile it the other night, but I'll remember to do so tonight to see what it looks like from that perspective. I like messing with performance stuff, so thanks for posing these questions! :]

jwood803 avatar Aug 21 '13 16:08 jwood803

Let me know if I'm totally wrong with this, everyone.

Did a small memory profile and it seems to indicate that the biggest offender may be the Test method. Looking at it, it calls the PatternRegex.IsMatch() method. The PatternRegex property seems to new up a new Regex object each time. I wonder if doing that each time it's called can cause it to create new objects on the heap each time it's called cause the performance numbers you were encountering?

jwood803 avatar Aug 24 '13 20:08 jwood803

Interesting have to look into this...

psoholt avatar Aug 27 '13 07:08 psoholt

hello ?
maybe someone can change IsMatch to return this

return Regex.IsMatch(toTest, RegexString, _modifiers);   

and Capture to this

var match = Regex.Match(toTest, RegexString, _modifiers);`

Yousefjb avatar Jan 21 '16 04:01 Yousefjb