purescript-strings icon indicating copy to clipboard operation
purescript-strings copied to clipboard

Bug or at least unexpected side effect of replaceAll

Open cscalfani opened this issue 5 months ago • 6 comments

In Javascript, the replace function when the first parameter is a Regex, there are special character strings for replacements, e.g. $'.

The problem is that the PureScript replaceAll relies on the JavaScript replace using a Regex with the global flag.

This means that replacement strings are simple string replacements, but instead are interpreted.

Suggested fix:

export const replaceAll = p => r => s => s.replace(new RegExp(p.replace(/[-\/\\^$*+?.()|[\]{}]/g, "\\$&"), "g"), r.replace(/\$/g, "$$$$")); // eslint-disable-line no-useless-escape

We're replacing the $ with $$ globally in the replacement string.

cscalfani avatar Jul 09 '25 16:07 cscalfani

I think we should probably be using the standard replaceAll method if at all possible, but I think we should be using the replacement function approach instead to avoid needing to escape anything. Something like s.replaceAll(p, () => r).

natefaubion avatar Jul 09 '25 17:07 natefaubion

I wasn't aware of that feature... good god I hate JS.

I like this suggestion a lot better than mine 😄 which means the code would be:

export const replaceAll = p => r => s => s.replace(new RegExp(p.replace(/[-\/\\^$*+?.()|[\]{}]/g, "\\$&"), "g"), () => r); // eslint-disable-line no-useless-escape

cscalfani avatar Jul 09 '25 18:07 cscalfani

The thing you're trying to do is now available as a built-in called RegExp.escape, though unfortunately any engine that has that probably already has replaceAll, which should be used when available.

I suggest just walking the string yourself with something like this:

export const replaceAll = "".replaceAll
  ? p => r => s => s.replaceAll(p, r)
  : p => r => s => {
    var pLength = Math.max(1, p.length);
    var matchIndices = [];
    var matchIndex = s.indexOf(p);
    while (matchIndex !== -1) {
      matchIndices.push(matchIndex);
      matchIndex = s.indexOf(p, matchIndex + pLength);
    }

    var restIndex = 0;
    var result = "";
    for (var i = 0; i < matchIndices.length; ++i) {
      result += s.slice(restIndex, matchIndices[i]) + r;
      restIndex = matchIndices[i] + p.length;
    }

    if (restIndex < s.length) {
      return result + s.slice(restIndex);
    }

    return result;
  };

michaelficarra avatar Jul 09 '25 21:07 michaelficarra

The current implementation already does a RegExp escape, so it's not clear to me what switching that up does. I think the path of least resistance is using the replacer function.

natefaubion avatar Jul 09 '25 21:07 natefaubion

It's like 60% faster, equivalent to the native replaceAll in my tests. See https://jsbench.me/pdmcwiblhn/1.

Image

Also then you don't have to worry that you've actually escaped all the things you need to escape.

michaelficarra avatar Jul 09 '25 22:07 michaelficarra

RegExp.escape is not support in Node yet.

I originally rewrote replaceAll using RegExp.escape and my tests failed because the tests run in Node. I'm running Node version 22.15.0.

cscalfani avatar Jul 09 '25 23:07 cscalfani