regexp-tree icon indicating copy to clipboard operation
regexp-tree copied to clipboard

`optimize` favors shortness over readability

Open fregante opened this issue 4 years ago • 3 comments

I use regexp-tree via eslint-plugin-unicorn/better-regex and found that some optimizations lead to less-readable output.

Similarly to #135, I'd argue that readability and shortness might not always overlap. I found this specific example, outputted by regexp-tree

/^\/\/\*\??/

Can you guess what the expected match looks like without spending a couple seconds on the zigzag sequence?

This was written as:

/^[/][/][*][?]?/

Which IMHO is slightly more scannable/readable.

My guess is that this is caused by charClassToSingleChar, which makes sense for [\d] -> \d but worsens what I just showed.

fregante avatar May 04 '20 15:05 fregante

@fregante, thanks for the report. Yes, I see how optimizer may lead to less readable outputs, however, the resulting regexp object in this case is still would be more optimized from the regexp engine perspective (handling simple sequence should be faster in general, than handling a sequence of character classes).

To support your use case specifically, optimizer has both, whitelist and blacklist. You can blacklist the charClassToSingleChar in case.

DmitrySoshnikov avatar May 09 '20 04:05 DmitrySoshnikov

Ideally though at linting time I get both [\d] -> \d and \/ -> [/] (I’d like to see this specifically) because they optimize for readability.

Then at build/minification time you could apply any changes that conflict with readability but improve speed and size.

The first would be called by eslint and the second one would be called by Webpack, terser etc.

Since optimize is already customizable this would probably mean:

  • splitting charClassToSingleChar into 2 rules to allow \/ -> [/]
  • introducing config groups a-la “eslint:recommended”, for example whitelist: ['optimize:minimize'] and whitelist: ['optimize:readability']

fregante avatar May 09 '20 10:05 fregante

Another everyday example:

/http:\/\/[^/]+\/pull\/commits/gi

👆 Readable

/ht{2}p:\/{2}[^/]+\/pul{2}\/com{2}its/gi

👆 Nonsense, longer

Strangely this only happens if I include that [^/]+\/. If it's not there, it's not "optimized"


Tested on RunKit

var regexpTree = require("regexp-tree")
regexpTree.optimize(/http:\/\/[^/]+\/pull\/commits/gi).toRegExp()

fregante avatar Jun 12 '20 21:06 fregante