fast-check icon indicating copy to clipboard operation
fast-check copied to clipboard

arbitrary: string matching regex

Open wizzard0 opened this issue 3 years ago • 3 comments

🚀 Feature Request

Test with strings matching a given regex

Motivation

Fastcheck doesn't do whitebox test like afl-fuzz and so on, but for testing parsers etc we can still go a bit deeper by throwing known-valid strings which would pass "first layer of defenses"

See genex and regexp-enumerator (though probably we need one that can generate given the seed and not just iterate thru the possible universe)

Example

  • test url parser with pattern like "https://[a-z0-9-.]+/.*" and not just random strings

Actually, one can go even deeper and build CFG generators and not just regexes with invertible parsers like nearley but we have to stop somewhere :)

Edit: will help with #484 by iterating thru possibilities faster :)

wizzard0 avatar May 28 '22 10:05 wizzard0

Thanks for the suggestion, it will clearly push #484 a step further. Let's see how it goes for the simple case of #484 and iterate over it to try something way more powerful as you suggested 🤔

The main challenge would probably be performance to generate strings and maybe ability to shrink them properly.

For the moment, for the regex you passed the best way would be to use a stringOf generating the a-z0-9-., a fullUnicodeString for the .* and the tuple of them followed by a map appending the https.

dubzzz avatar May 30 '22 08:05 dubzzz

Here is the current approach to build those kind of values using fast-check for the moment:

const alhpaNumericCharacterArb = fc.mapToConstant(
  { num: 26, build: v => String.fromCharCode(v + 0x61) }, // a-z
  { num: 10, build: v => String.fromCharCode(v + 0x30) }, // 0-9
  { num: 1, build: v => '-' }, // -
  { num: 1, build: v => '.' }, // .
);
const urlArb = fc.record({
  domain: fc.stringOf(alhpaNumericCharacterArb),
  path: fc.fullUnicodeString(),
}).map(opts => `https://${opts.domain}/${opts.path}`);

Please note, that there is a built-in builder for urls, see webUrl.

dubzzz avatar May 31 '22 07:05 dubzzz

another example is:

fc.property(
  fc.oneof(
    fc
      .record({
        ip: fc.ipV4(),
        port: fc.option(fc.integer({ min: 1, max: 65535 })),
      })
      .map(({ ip, port }) => `${ip}${port ? `:${port}` : ""}`),
    fc
      .record({
        ip: fc.ipV6(),
        port: fc.option(fc.integer({ min: 1, max: 65535 })),
      })
      .map(({ ip, port }) => {
        if (port) {
          return `[${ip}]:${port}`;
        }
        return ip;
      }),
    fc.integer({ min: 1, max: 65535 }).map((port) => `:${port}`)
  )
)

which generates for me either a ipv4 address w/ optional port, an ipv6 address with optional port, or a raw port.

I've enjoyed the fc.record({}).map(()=>{}) pattern recently for easily creating one-off string arbitraries

jasikpark avatar Sep 12 '22 18:09 jasikpark

fc.stringMatching(/[\w.-]*/) would be more concise and likely more efficient than fc.stringOf(fc.char().filter((c) => /[\w.-]/.test(c))) I'm currently using.

fast-check could parse regex with https://github.com/fent/ret.js and map its AST to fc.Arbitrary<number[]>, and then String.fromCodePoint.

wojpawlik avatar Apr 14 '23 19:04 wojpawlik

Good news, I'm currently working on a first version of a stringMatching(Regex) helper. So far, I don't manage contextual parts such as ^, $ or \b. This feature will probably come as a separate package as it needs to pull an external dependency, so I prefer putting it appart at the moment.

In the meantime, depending how complex blob formats are, I may provide a built-in string matching blob as part of fast-check while regex one will stay appart (for now).

dubzzz avatar May 22 '23 06:05 dubzzz