tokay icon indicating copy to clipboard operation
tokay copied to clipboard

`Line` built-in

Open phorward opened this issue 2 years ago • 4 comments

Line could be a built-in parselet accepting any line. The line-end should be depending on the used operating system (\r\n win,\r classic mac, \n unix/linux)

phorward avatar Mar 07 '22 17:03 phorward

Making this platform specific might end up being a pain in the ass, as someone that uses git-bash/msys tools on windows I get bitten by this a lot since some programs are unaware of the difference, others try to support it by doing the right thing for each system, and some try to be extra generic and support all cases with annoying hacks.

The thing I appreciate the most is when there is very obvious documentation that states what's the default, and an easy way to configure it differently (CLIoption/environment variable), so I would vote for that if we're doing a voting thing here.

regardless of that, I implemented a Line parselet for my own needs which should theoretically match on both Windows and Linux line endings: (over optimistic github language identifier 🤣 )

NL : @{ '\n' ; '\r\n' }

NotNl : @{
  peek not NL .
}
Line : @{
  NotNl+ NL "".join($1)
}

It works, but I bumped into a lot of surprises on the way, so I'd appreciate it if you could look at some of my failed attempts and explain them, I since I'm unsure if they're bugs or my own misunderstandings.

Changing the above to a parselet that supports classic Mac as well should be pretty simple, I believe, just add '\r' to the NL sequence, either way It'd be nice to have your opinion on this.

(@phorward tagging you cause I don't know if you're getting notification or not 😄 )

nivpgir avatar Mar 09 '22 21:03 nivpgir

Making this platform specific might end up being a pain in the ass, as someone that uses git-bash/msys tools on windows I get bitten by this a lot since some programs are unaware of the difference, others try to support it by doing the right thing for each system, and some try to be extra generic and support all cases with annoying hacks.

The thing I appreciate the most is when there is very obvious documentation that states what's the default, and an easy way to configure it differently (CLIoption/environment variable), so I would vote for that if we're doing a voting thing here.

Hello @nivpgir,

good point on this.

I think making it configurable would be the best option so far. This might also be achieved by setting a specific parselet to the wanted behavior.

In the end, this is a case for #10 which are one of Tokay's core feature planned for the next versions (I hope it becomes available with v0.6): Generic parselets will allow to define new parselets where consumables are variable during compile-time. This will make parselets with a given behavior re-usable, as they can be used with different other parselets or tokens, and reduce the amount of definition. For example, your example implementation of Line is a version of the planned, generic builtin parselet Until<P> , which parses anything from the stream until P matches.

Line could then be defined like so:

# Default (could be builtin)
Line : Until<@{ '\n' ; '\r\n'; '\r' }>     # default matching any possible line ending (likewise your NL Parselet)

# Individual redefinition
Line : Until<'\n'>   # match only Unix/Linux EOL, re-defines Line
WinLine : Until<'\r\n'>   # match only Windows EOL, creates WinLine

# Example usage
print("Please enter your name with Windows CRLF: ", newline=false)   WinLine   print("Hello " + $2)

(@phorward tagging you cause I don't know if you're getting notification or not smile )

Thanks! I've also changed my notification settings and hope I'll get informed when not tagged here as well.

phorward avatar Mar 10 '22 08:03 phorward

hmmm, what's the benefit of making Line generic at compilation time? most of the time you'd want NL to be decided per execution, and not once at compile time.

nivpgir avatar Mar 10 '22 10:03 nivpgir

hmmm, what's the benefit of making Line generic at compilation time? most of the time you'd want NL to be decided per execution, and not once at compile time.

Well, Line is not generic, it should be implemented by generic features. It is like your implementation example, but more optimized and built into Tokay itself. Line can always be re-defined to work with any other line delimiter, which is similar to e.g. awk's RS record separator.

phorward avatar Mar 10 '22 11:03 phorward