ack3 icon indicating copy to clipboard operation
ack3 copied to clipboard

ack get stucked with a nested regex

Open satanson opened this issue 5 years ago • 7 comments

In order to extract CPP function defintion, a nested regex is constructed, which has a structure as follow:

two_colon_separated_namespace [~]?identifier  nested_parentheses nested_braces

the concrete regex is hard to read, list as follow,too:

.*?((?::{2})?(?:\b[A-Za-z_]\w*(?::{2}))*[~]?[A-Za-z_]\w*)(?:\s)*(?:\([^()]*([^()]*\((?-1)*\)[^()]*|[^()]*\([^()]*\)[^()]*)*[^()]*\))(?:\s)*(?:{[^{}]*([^{}]*{(?-1)*}[^{}]*|[^{}]*{[^{}]*}[^{}]*)*[^{}]*}))

when exec ack on a cpp file, it get stuck forever.

ack-standalone '(.*?((?::{2})?(?:\b[A-Za-z_]\w*(?::{2}))*[~]?[A-Za-z_]\w*)(?:\s)*(?:\([^()]*([^()]*\((?-1)*\)[^()]*|[^()]*\([^()]*\)[^()]*)*[^()]*\))(?:\s)*(?:{[^{}]*([^{}]*{(?-1)*}[^{}]*|[^{}]*{[^{}]*}[^{}]*)*[^{}]*}))' RocksDBPrimaryIndex4.cpp

RocksDBPrimaryIndex4.cpp content as follows:

RocksDBPrimaryIndex::RocksDBPrimaryIndex(arangodb::LogicalCollection& collection,
                                         arangodb::velocypack::Slice & info)
    : RocksDBIndex(
          IndexId::primary(), collection, StaticStrings::IndexNamePrimary,
          std::vector(
              {{arangodb::basics::AttributeName(StaticStrings::KeyString, false)}}),
          true, false, RocksDBColumnFamily::primary(),
          basics::VelocyPackHelper::stringUInt64(info, StaticStrings::ObjectId),
          basics::VelocyPackHelper::stringUInt64(info, StaticStrings::TempObjectId),
          static_cast<RocksDBCollection*>(collection.getPhysical())->cacheEnabled()),
      _isRunningInCluster(ServerState::instance()->isRunningInCluster()) {
  TRI_ASSERT(_cf == RocksDBColumnFamily::primary());
  TRI_ASSERT(objectId() != 0);
}

Ag can handle this case. Perl version: v5.32.0 ack version: da42136a95d4465aeadf391e00d53c6ae62c7b69

satanson avatar Nov 29 '20 02:11 satanson

I see same with ack3 on Perl 5.030 .

Oddly, the RegEx given does not hang in Perl without ack, but does not match the proffered file either.

 $  perl -nlE  'say $1 if m{(.*?((?::{2})?(?:\b[A-Za-z_]\w*(?::{2}))*[~]?[A-Za-z_]\w*)(?:\s)*(?:\([^()]*([^()]*\((?-1)*\)[^()]*|[^()]*\([^()]*\)[^()]*)*[^()]*\))(?:\s)*(?:{[^{}]*([^{}]*{(?-1)*}[^{}]*|[^{}]*{[^{}]*}[^{}]*)*[^{}]*}))}' RocksDBPrimaryIndex4.cpp
$

So while the report is reproducible, it's not yet a full test case.

n1vux avatar Nov 29 '20 04:11 n1vux

@n1vux It's very weird, global match and substitution get stucked with nested regex, maybe possessive mode is used, however, a error result is better than stuck.

satanson avatar Nov 29 '20 04:11 satanson

I'm poking at this but the first thing I would do is get rid of the .*? at the beginning which will always match, and then get rid of the capturing parentheses at the beginning that aren't getting used.

petdance avatar Nov 29 '20 22:11 petdance

My ag doesn't hang, but neither does it find the match that you're wanting.

petdance avatar Nov 29 '20 22:11 petdance

Not-match is better then hang

satanson avatar Nov 30 '20 01:11 satanson

Not if it's supposed to match. Is your regex actually correct?

petdance avatar Nov 30 '20 03:11 petdance

@petdance the regex is correct, in perspectives of both syntax and semantics. To the worst, the ack should return quickly rather than geting stuck forever. in fact, global match and subsitition with nested regex in perl itself can also get stuck if the target text has un-paired <>(){}, so I think there is a bug in perl, and nested regex is handled in a possesive way and no backtrace.

satanson avatar Nov 30 '20 04:11 satanson