tokay
tokay copied to clipboard
`Line` built-in
Line
could be a built-in parselet accepting any line. The line-end should be depending on the used operating system (\r\n
win,\r
classic mac, \n
unix/linux)
Making this platform specific might end up being a pain in the ass, as someone that uses git-bash/msys tools on windows I get bitten by this a lot since some programs are unaware of the difference, others try to support it by doing the right thing for each system, and some try to be extra generic and support all cases with annoying hacks.
The thing I appreciate the most is when there is very obvious documentation that states what's the default, and an easy way to configure it differently (CLIoption/environment variable), so I would vote for that if we're doing a voting thing here.
regardless of that, I implemented a Line
parselet for my own needs which should theoretically match on both Windows and Linux line endings:
(over optimistic github language identifier 🤣 )
NL : @{ '\n' ; '\r\n' }
NotNl : @{
peek not NL .
}
Line : @{
NotNl+ NL "".join($1)
}
It works, but I bumped into a lot of surprises on the way, so I'd appreciate it if you could look at some of my failed attempts and explain them, I since I'm unsure if they're bugs or my own misunderstandings.
Changing the above to a parselet that supports classic Mac as well should be pretty simple, I believe, just add '\r'
to the NL
sequence, either way It'd be nice to have your opinion on this.
(@phorward tagging you cause I don't know if you're getting notification or not 😄 )
Making this platform specific might end up being a pain in the ass, as someone that uses git-bash/msys tools on windows I get bitten by this a lot since some programs are unaware of the difference, others try to support it by doing the right thing for each system, and some try to be extra generic and support all cases with annoying hacks.
The thing I appreciate the most is when there is very obvious documentation that states what's the default, and an easy way to configure it differently (CLIoption/environment variable), so I would vote for that if we're doing a voting thing here.
Hello @nivpgir,
good point on this.
I think making it configurable would be the best option so far. This might also be achieved by setting a specific parselet to the wanted behavior.
In the end, this is a case for #10 which are one of Tokay's core feature planned for the next versions (I hope it becomes available with v0.6): Generic parselets will allow to define new parselets where consumables are variable during compile-time. This will make parselets with a given behavior re-usable, as they can be used with different other parselets or tokens, and reduce the amount of definition. For example, your example implementation of Line is a version of the planned, generic builtin parselet Until<P>
, which parses anything from the stream until P
matches.
Line
could then be defined like so:
# Default (could be builtin)
Line : Until<@{ '\n' ; '\r\n'; '\r' }> # default matching any possible line ending (likewise your NL Parselet)
# Individual redefinition
Line : Until<'\n'> # match only Unix/Linux EOL, re-defines Line
WinLine : Until<'\r\n'> # match only Windows EOL, creates WinLine
# Example usage
print("Please enter your name with Windows CRLF: ", newline=false) WinLine print("Hello " + $2)
(@phorward tagging you cause I don't know if you're getting notification or not smile )
Thanks! I've also changed my notification settings and hope I'll get informed when not tagged here as well.
hmmm, what's the benefit of making Line
generic at compilation time? most of the time you'd want NL
to be decided per execution, and not once at compile time.
hmmm, what's the benefit of making
Line
generic at compilation time? most of the time you'd wantNL
to be decided per execution, and not once at compile time.
Well, Line
is not generic, it should be implemented by generic features. It is like your implementation example, but more optimized and built into Tokay itself. Line can always be re-defined to work with any other line delimiter, which is similar to e.g. awk's RS record separator.