Regex101
Regex101 copied to clipboard
CRLF testing
Test strings on regex101 currently only "support" LF line endings - there is no way to test whether my \r\n
pattern will match the string correctly in a text containing CRLF line endings. It would be good to have some option - a switch or something - to tell regex101 that I want my test string to be treated as if it used CRLF.
As you can see in a related issue https://github.com/firasdib/Regex101/issues/257, the browser automatically converts \r
to \n
, so there is no way to test them in the text fields.
I suppose the only way to support this is to allow writing the input string as escaped string.
That sounds good!
I don't think I will support this I'm afraid, it's too much work for too little reward. It's good practice regardless to use something like \r?\n
when matching line endings.
How does this work? When you run the regex, you just take the string from the textarea and pass it to a function, right? If there was a switch to "treat line endings as if the string was on Windows (CRLF)", you could just do a quick search & replace on the string before passing it to the actual regex function. Or is it more complicated than that?
I suppose that is doable, unless we're overseeing something. But either way, that means another option to confuse people and another feature with more complexity added to the code base which only a fraction of the users will use. I don't think its worth it, and judging by this issue, nobody but you seem to want it.
I understand where you're coming from, but I'm hoping you see where I'm coming from as well.
Agree, no problem. You're doing great job with regex101!
Thank you! 😄
Actually, this got me thinking.
I don't want more options because most of the users are probably never going to know its there. So this has to work automagically. What I can do is try to detect which operating system the user is using, and if its Windows, ensure lines are CRLF else just LF.
What do you think about that @borekb?
If people want to test CR/LF/CRLF, they would want to be able to test all 3 variations rather than only a single of them. Even if people have different OS to test, they would have to know of this hidden feature in regex101 to take advantage of, which I think would apply to a very small audience who is following regex101 development.
Either provide some sort of support which allows people to test all 3 variations, or accept the browser limitation and only allow LF like what we are doing right now.
Agree with @nhahtdh. How about something like this?
The "LF" would be a subtle dropdown allowing me to select between the three line endings.
@borekb Thats already too much clutter we can't afford. A dropdown like that in the main UI is out of the question.
These are my alternatives:
- The document will be split on CRLFs as well as lone CRs and LFs, and a single LF will be used as line separator in all output.
- Explicitly set CRLF as line separator, meaning lines will be split on exactly that and joined by the same.
We could add heuristics to parse the string and set the separator to the first thing we encounter?
I thought it's not possible to maintain CRLF in the text area and is a browser limitation, is it not? If you can maintain whatever line endings users pasted, it would be best. Or maybe I don't understand what you're proposing :)
BTW whenever working with text, line endings are one of the most important things there is which is why even the most minimalistic text editors usually display line ending style somewhere in the UI straight way. I know it's unpleasant but I deal with CRLF vs. LF issues quite often, unfortunately.
No, it's not a true limitation. Its just everything revolves around LF so whenever you grab a string, its converted to LF. Typically.
I can not maintain line separators the user inserts, but I can try to find the first separator used and make the entire document conform to that. Maybe.
So who's doing the conversion of the text in my clipboard that contains CRLF line endings to a string that only contains LF? Is it:
- The browser?
- Some JavaScript function of regex101?
- The regex engine?
If it's two, why to convert in the first place? (Sorry just trying to understand this issue.)
Your string has a different internal representation than just "a string"; its a bunch of lines with tokens in them. This is initially created by splitting by CRLF, CR and LF. When I grab the value, I have to stitch this back together, and have to use a explicit line separator.
I think it doesn't make sense to stitch them together with your separator. As I mentioned before, if someone wants to test CRLF, they would want to be able to put CR and LF freely.
I understand that, but there is nothing I can do about it. Its the internal representation of strings in codemirror.
Oh so when I paste a string that contains CRLF characters in it, CodeMirror gets rid of whatever line endings there were and it doesn't tell you what those were? (If it did, you could create a simple map between line no. and the line ending character that was used, something like
0 => CRLF
1 => LF
2 => CRLF
etc.
and then stitch it together using this map.)
But OK, this is not ideal. From the user perspective, I would probably want to specify the line endings from the UI. I'd ideally like to tell regex101 "treat this multi-line string as if it was separated by CRLF".
My use case is this: we process a custom INI-inspired format that, for historic reasons, uses CRLF. So whenever I'm testing some regex against it, I need to tell the tool to use this line ending.
I don't control that part of the code so I can not decide how to stitch it back together, unfortunately.
Yes, I understand your use case, but why not just use \r?\n
to match your line endings? Thats guaranteed to work for either case you encounter?
What if I want to test that the string uses CRLF? \r
is not optional in our case.
I don't control that part of the code so I can not decide how to stitch it back together, unfortunately.
That's why I think I should be able to specify the line endings from the UI. I understand the desire to not clutter the interface though.
Codemirror is not very great with this aspect. If I specify an explicit line ending, it will use that, and only that, to split and join. That means if you have a string with CRLF and LF, and use CRLF as line terminator, you'll "lose" the line with LF (it wont be a line, just a string).
I understood that CodeMirror automatically converts multiline strings to arrays of single-line strings. Then, you need to stitch it back together for regex evaluation. And I'm only proposing touching the second step: I, as a user, would be able to tell regex101 "for stitching, use CRLF". If that makes sense & is doable.
Anyway, this discussion is already long enough, I'm sure you have better things to do :) Thanks for taking your time to discuss this.
That's quite a use case you have there. I think it's going to clash with the proposal here https://github.com/firasdib/Regex101/issues/539 to make PCRE tester align with PHP.
FWIW, CodeMirror 6 is around the horizon, and support for this might be improved in that release. I'll have to investigate.
Hopefully something workable would be implemented soon ☺️
Hello, feedback on this.
Totally agreed that nothing automagical should be in place, do not assume my data encoding or line ending if i can explicitely tell it to you. If different line ending are to be supported, it should be set explicitely.
Not too sure how it's possible but as far as UI cluttering goes, the screenshot from OP is pretty good, on the right of "Test String" you have a big unusued horizontal space where this would fit nicely, if you want to keep the UI as it is by default, we would be honored to enable such UI element with an option first.
@firasdib Could you not use thelineSeparator
option when constructing the editor instance to allow the line endings to be user-configurable?
I was curious about this limitation of regex101 and just Googled it today to find this issue. I know nothing about CodeMirror other than the 5 minutes I just spent looking into it, so maybe I am missing something in regards to why this would not work.
Regardless, I think that the numerous duplicate issues and the fact that this was the second result in my Google search together show that there might be a little more interest in this feature than originally thought.
In order to keep with your wishes of not adding clutter to the main UI, I think it could be tastefully tacked onto the delimiters menu (pardon the crude mockup I did in Paint), as such:
Support for this will be added in the upcoming release where you can select line ending in the settings.