opm-common icon indicating copy to clipboard operation
opm-common copied to clipboard

Go back to using fnmatch() instead of lots of regexes.

Open atgeirr opened this issue 10 months ago • 14 comments

This cuts a lot of time off Norne for me, and is noticable even on SPE9.

Norne summaries:

Number of MPI processes:         6
Threads per MPI process:         1
Setup time:                      9.91 s
  Deck input:                    3.42 s
Number of timesteps:           344
Simulation time:               213.35 s
  Assembly time:                34.56 s (Wasted: 0.0 s; 0.0%)
    Well assembly:               2.37 s (Wasted: 0.0 s; 0.0%)
  Linear solve time:            80.08 s (Wasted: 0.0 s; 0.0%)
    Linear setup:               38.98 s (Wasted: 0.0 s; 0.0%)
  Props/update time:            31.29 s (Wasted: 0.0 s; 0.0%)
  Pre/post step:                45.60 s (Wasted: 0.0 s; 0.0%)
  Output write time:            15.62 s
Overall Linearizations:       1525      (Wasted:     0; 0.0%)
Overall Newton Iterations:    1181      (Wasted:     0; 0.0%)
Overall Linear Iterations:    2325      (Wasted:     0; 0.0%)

Number of MPI processes:         6
Threads per MPI process:         1
Setup time:                      6.96 s
  Deck input:                    3.27 s
Number of timesteps:           344
Simulation time:               190.98 s
  Assembly time:                33.83 s (Wasted: 0.0 s; 0.0%)
    Well assembly:               2.31 s (Wasted: 0.0 s; 0.0%)
  Linear solve time:            77.33 s (Wasted: 0.0 s; 0.0%)
    Linear setup:               37.92 s (Wasted: 0.0 s; 0.0%)
  Props/update time:            30.60 s (Wasted: 0.0 s; 0.0%)
  Pre/post step:                28.06 s (Wasted: 0.0 s; 0.0%)
  Output write time:            15.14 s
Overall Linearizations:       1525      (Wasted:     0; 0.0%)
Overall Newton Iterations:    1181      (Wasted:     0; 0.0%)
Overall Linear Iterations:    2325      (Wasted:     0; 0.0%)

Ignoring random variations in runtimes there is still a significant difference to pre/post.

atgeirr avatar Mar 25 '24 13:03 atgeirr

jenkins build this please

atgeirr avatar Mar 25 '24 13:03 atgeirr

benchmark please

atgeirr avatar Mar 25 '24 13:03 atgeirr

I don't think we should do this, even if the benchmarks are promising. Fnmatch() is Posix-only and this makes our code a lot less portable.

bska avatar Mar 25 '24 13:03 bska

Fnmatch() is Posix-only and this makes our code a lot less portable.

Not a lot, I would argue, as posix is quite pervasive, but I agree it would be better to avoid. It could be that regex is horrible on clang/libc++ and that is why I see so great differences, but in any case we should replace the current function here with something much faster. Maybe a simplified FreeBSD or GPLv3+ licensed implementation of fnmatch() exists?

atgeirr avatar Mar 25 '24 14:03 atgeirr

Maybe a simplified FreeBSD or GPLv3+ licensed implementation of fnmatch() exists?

https://github.com/lattera/freebsd/blob/master/lib/libc/gen/fnmatch.c

Turned out that was a very short hop away... That at least may be a starting point. I think we can drop the flags that are particular to file and path matching, simplifying the code.

atgeirr avatar Mar 25 '24 14:03 atgeirr

Fnmatch() is Posix-only and this makes our code a lot less portable.

Not a lot, I would argue, as posix is quite pervasive, but I agree it would be better to avoid.

There's not a lot of justification, but the discussion in PR #2925 does suggest that portability is the main reason for using regular expressions here, especially for downstream clients which need to support Windows–e.g., ResInsight.

Maybe a simplified FreeBSD or GPLv3+ licensed implementation of fnmatch() exists?

Sure, but I'd prefer that we look closer into what's incurring undue costs in the current implementation. If it is mainly the construction of the std::regex objects, then one alternative would be to rewrite loops of the form

for (const auto& e : collection) {
    if (shmatch(e, patt)) {
        use(e);
    }
}

into something along the lines of

const shmatcher m { patt };
for (const auto& e : collection) {
    if (m.matches(e)) {
        use(e);
    }
}

instead. In other words, to construct the pattern/matcher once instead of on each call.

bska avatar Mar 25 '24 15:03 bska

Benchmark result overview:

Test Configuration Relative
opm-git OPM Benchmark: drogon - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: drogon - Threads: 8 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: punqs3 - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: punqs3 - Threads: 8 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: smeaheia - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: smeaheia - Threads: 8 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: spe10_model_1 - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: spe10_model_1 - Threads: 8 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_extra - Threads: 1 - FOIT (Total Oil Injection At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_extra - Threads: 8 - FOIT (Total Oil Injection At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_norne - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_norne - Threads: 8 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 - FOPT (Total Oil Production At End Of Run) 1
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 - FOPT (Total Oil Production At End Of Run) 1
  • Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=2416

ytelses avatar Mar 25 '24 23:03 ytelses

The benchmark were not properly executed, any clue @blattms ?

atgeirr avatar Mar 26 '24 11:03 atgeirr

#include <chrono>
#include <cstring>
#include <iostream>
#include <regex>
#include <string>

#include <fnmatch.h>


bool shmatch(const std::string& pattern, const std::string& symbol)
{
    // Shell patterns should implicitly be interpreted as anchored at beginning
    // and end.
    std::string re_pattern = "^" + pattern + "$";

    {
        // Shell wildcard '*' should be regular expression arbitrary character '.'
        // repeated arbitrary number of times '*'
        std::regex re("\\*");
        re_pattern = std::regex_replace(re_pattern, re, ".*");
    }

    {
        // Shell wildcard '?' should be regular expression one arbitrary character
        // '.'
        std::regex re("\\?");
        re_pattern = std::regex_replace(re_pattern, re, ".");
    }

    std::regex regexp(re_pattern);
    return std::regex_search(symbol, regexp);
}

struct shmatcher
{
  shmatcher(const std::string& pattern)
  {
      // Shell patterns should implicitly be interpreted as anchored at beginning
      // and end.
      std::string re_pattern = "^" + pattern + "$";

      {
          // Shell wildcard '*' should be regular expression arbitrary character '.'
          // repeated arbitrary number of times '*'
          std::regex re("\\*");
          re_pattern = std::regex_replace(re_pattern, re, ".*");
      }

      {
          // Shell wildcard '?' should be regular expression one arbitrary character
          // '.'
          std::regex re("\\?");
          re_pattern = std::regex_replace(re_pattern, re, ".");
      }

      regex_ = std::regex(re_pattern);
  }

  bool match(const std::string& symbol)
  {
    return std::regex_search(symbol, regex_);
  }

  std::regex regex_;
};


void naive_regex(int num_iter)
{
    const auto startTime = std::chrono::system_clock::now();
    for (int i = 0; i < num_iter; ++i) {
        shmatch("NAME.*", "NAME");
    }
    std::cout << "Naive regex: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - startTime).count()  / 1000.0 << std::endl;
}

void regex_struct(int num_iter)
{
    shmatcher matcher{"NAME.*"};

    const auto startTime = std::chrono::system_clock::now();
    for (int i = 0; i < num_iter; ++i) {
        matcher.match("NAME");
    }
    std::cout << "Struct regex: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - startTime).count()  / 1000.0 << std::endl;
  }

void use_fnmatch(int num_iter)
{
    const auto startTime = std::chrono::system_clock::now();
    int count = 0;
    for (int i = 0; i < num_iter; ++i) {
       count += fnmatch("NAME.*", "NAME", 0);
    }
    std::cout << "Fnmatch: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - startTime).count()  / 1000.0 << " (" << count << ")" << std::endl;
}


int main(int argc, char** argv)
{
  if (argc < 2) {
      std::cerr << "need number of iterations" << std::endl;
      return 1;
  }
  int num_iter = std::atoi(argv[1]);

  naive_regex(num_iter);
  regex_struct(num_iter);
  use_fnmatch(num_iter);

  return 0;
}

./shmatch 50000

Naive regex: 0.218
Struct regex: 0.013
Fnmatch: 0.001

So approximately reduced from 100x to 10x.

akva2 avatar Apr 08 '24 11:04 akva2

benchmark please

akva2 avatar May 31 '24 06:05 akva2

Can we rewrite the function without using regex and fnmatch() for both performance and portability issue?

GitPaean avatar May 31 '24 09:05 GitPaean

Can we rewrite the function without using regex and fnmatch() for both performance and portability issue?

I suppose. That would effectively mean writing our own fnmatch() though.

bska avatar May 31 '24 10:05 bska

I suppose. That would effectively mean writing our own fnmatch() though.

I think if the performance and portability both have priorities (especially if we do not need all the functionalities), we should do it, but you might have other concerns regarding software policy.

The fnmatch() function does not look very complicated either, https://github.com/gcc-mirror/gcc/blob/master/libiberty/fnmatch.c .

GitPaean avatar May 31 '24 10:05 GitPaean

Benchmark result overview:

Test Configuration Relative
opm-git OPM Benchmark: drogon - Threads: 1 0.992
opm-git OPM Benchmark: drogon - Threads: 8 1.221
opm-git OPM Benchmark: punqs3 - Threads: 1 0.994
opm-git OPM Benchmark: punqs3 - Threads: 8 1.032
opm-git OPM Benchmark: smeaheia - Threads: 1 0.994
opm-git OPM Benchmark: smeaheia - Threads: 8 1.139
opm-git OPM Benchmark: spe10_model_1 - Threads: 1 1.009
opm-git OPM Benchmark: spe10_model_1 - Threads: 8 1.004
opm-git OPM Benchmark: flow_mpi_extra - Threads: 1 0.99
opm-git OPM Benchmark: flow_mpi_extra - Threads: 8 1.089
opm-git OPM Benchmark: flow_mpi_norne - Threads: 1 0.992
opm-git OPM Benchmark: flow_mpi_norne - Threads: 8 1.062
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 1 0.981
opm-git OPM Benchmark: flow_mpi_norne_4c_msw - Threads: 8 1.019
  • Speed-up = Total time master / Total time pull request. Above 1.0 is an improvement. *

View result details @ https://www.ytelses.com/opm/?page=result&id=2492

ytelses avatar May 31 '24 15:05 ytelses