FerretDB Support `$regex`'s `x` option

Cover with great tests and fuzzing.

May 11 '22 18:05 AlekSi

Hi, I was studying a little bit a codebase yesterday and I'm interested in this issue. Wouldn't you mind if I take that on me?

May 23 '22 11:05 noisersup

Sure. Please make sure that you add good tests for that. I will try to help with fuzzing as soon as possible :)

May 23 '22 12:05 AlekSi

There's not too much about free-spacing mode on internet, as far as I found there are also small differences between engines, finally some of them doesn't implement it at all...

My approach is to use every regex input with "x" flag as an argument in a function which just translates multi-line, free-spaced, commented regex into a string understandable by Go regexp package.

However I'm not sure if my condidtions correspond to the intended behavior.

Conditions that I'm not sure:

Spaces between [ ] shouldn't be removed from a string
Spaces between { } shouldnt't be also removed. ( a{10} matches all text where a is repeated 10 times in a row, but a{1 0} just match "a{1 0}" in a text)

I have tried to find an answer in mongodb source code but still I'm not sure about that. If anybody knows an answer I would be more than happy to hear about it. If something is not clear let me know, I'll do my best to clarify.

Also I understand that there's a lot of work to do on this project so I will just try to implement that this way and continue to search.

May 27 '22 14:05 noisersup

@noisersup would you please join slack? https://join.slack.com/t/ferretdb/shared_invite/zt-zqe9hj8g-ZcMG3~5Cs5u9uuOPnZB8~A

May 27 '22 14:05 seeforschauer

@seeforschauer I'm already there!

May 27 '22 14:05 noisersup

Conditions that I'm not sure

In that case, we use integration tests to do the same thing as MongoDB. Please add a test there. See there for an overview of our testing.

May 27 '22 14:05 AlekSi

Doing a regexp pattern preprocessing as described above would require full-fledged regex parsing to avoid issues when working with [] and {}.

The Go standard regex/syntax package is not a good fit here as it wasn't really made for external users. It creates an AST that is good for regex package compilation, but otherwise, it's hard to work with. For instance, it's hard to convert it back to string (or use it to construct a new, preprocessed string).

I used this package to do regexp parsing in a couple of linters: https://github.com/quasilyte/regex/tree/master/syntax

It supports most of the PCRE syntax as well, which can be handy since MongoDB uses this dialect.

Jul 05 '23 12:07 quasilyte

FerretDB FerretDB copied to clipboard

Support `$regex`'s `x` option

FerretDB
FerretDB copied to clipboard