compile-time-regular-expressions icon indicating copy to clipboard operation
compile-time-regular-expressions copied to clipboard

Unable to capture multi-line regex groups

Open nyckmaia opened this issue 4 years ago • 1 comments
trafficstars

Hi,

I'm trying to capture 5 fields/groups inside a XML file. I called fields: a, b, c, d and e A loaded the XML file inside a std::string variable called fileBuffer (Example below)

std::optional<std::string_view> testFunc(const std::string& fileBuffer) noexcept {
    using namespace ctre::literals;
    auto [whole, a, b, c, d, e] = ctre::search<R"(<b>(.*?)<\/b><\/a><\/text>[\n]+.*<b>(.*?)<\/b>.*[\n].*[\n].*<b>(.*)<\/b>.*[\n].*[\n].*<b>(.*)<\/b>.*[\n].*[>](.*)<\/text>)">(fileBuffer);

    if (whole) {  // <-- THIS IS ALWAYS 'FALSE'. WHY??
        std::cout << a << std::endl;
        std::cout << b << std::endl;
        std::cout << c << std::endl;
        std::cout << d << std::endl;
        std::cout << e << std::endl;
    }
    
    return std::nullopt;
}

On Ubuntu 20.04 environment

The code compiles and run without any error or warning. All ok! But the if result should be true. My setup:

  • Linux Ubuntu 20.04 x64
  • G++ 9.3
  • ctre 3.4.1

On Windows 10 environment

I tried to run the same project in a complete different setup:

  • Windows 10 x64
  • MinGW G++ 10.3
  • MinGW GDB 10.3
  • Qt Creator IDE 4.15.1
  • ctre 3.4.1

The code compiles ok without any error, but when I tried to run, I got a run time error: SIGSEGV

image

Here is my XML file that are stored in the fileBuffer variable:

<text top="285" left="54" width="134" height="20" font="0"><a href="javascript:;"><b>FIBRINOGÊNIO</b></a></text>
<text top="285" left="674" width="33" height="20" font="1"><b>177</b></text>
<text top="285" left="708" width="11" height="20" font="2"> </text>
<text top="285" left="725" width="56" height="20" font="1"><b>mg/dL</b></text>
<text top="285" left="835" width="13" height="24" font="3"> </text>
<text top="310" left="54" width="157" height="14" font="4"><b>Valor de Referência:</b></text>
<text top="310" left="232" width="118" height="14" font="5">200 à 393 mg/dL</text>

You can test my regex formula copying and pasting both XML and regex to the regex101 website It should output 5 regex captured fields/groups: image

Questions

  1. On Ubuntu environment, why the if result is always false but the regex101 website give me the correct 5 outputs? Could you help me to fix it?

  2. As you can see, the MinGW GCC version is higher than native Ubuntu GCC. So, Why I got a SIGSEGV on MinGW 10.3 and not in GCC 9.3?

nyckmaia avatar Oct 26 '21 00:10 nyckmaia

  1. I don't have GCC 9.3 available, I tried it on compiler explorer, but it times-out. When I try it with never version it returns correct values:

  2. the SIGSEGV is probably because you are running out of stack as you are using a lot greedy matches, consider changing them into lazy one .* -> .*?

hanickadot avatar Jan 13 '22 09:01 hanickadot