ink
ink copied to clipboard
Non-ASCII characters can't be part of knot and variable names?
Seems like they can't. Which is sad since I writing in Russian. If whole ink script utilizes unicode maybe it's a good idea to allow non-ASCII characters to be included in knot and variables names?
@joethephish Any comments? Sorry for pushing it but...
Hmm, it's a good quesiton - my instinct is that we shouldn't allow non-ASCII, though perhaps that's more grounded in old school programming tradition for identifiers rather than any genuine good reason!
Apologies, I'm really busy working toward a milestone on our new game here at inkle, but if you or anyone else wanted to experiment with changing the behaviour, the relevant parse function is here: https://github.com/inkle/ink/blob/master/inklecate/InkParser/InkParser_Logic.cs#L267
Note that the main aim with writing the identifier parsing function is that it needs to be non-ambiguous with other parts of the language. So (obviously) you need to make sure it doesn't accept the kind of punctuation that would be used in other parts of ink.
If you were to accept Russian characters, I'd suggest an "opt-out" approach where you say "allow all characters except space and these symbols" as opposed to the current "opt-in", which says to specifically allow a-z, 0-9, _.
Hrm, markdown support isn't a great idea, I don't think. Mainly because two of the most important markdown features, *bold* and **italic**, already conflict with ink's choice syntax.
I'm not entirely sure what you mean by the escaping - would the "master" format by markdown (escaping ink) or would the "master" format by ink (escaping markdown)?
Ink can indeed escape characters:
* \*\*Markdown-style bold\*\* text
But not sure why you'd want to do that... doesn't look like the most sensible way to write! We're planning to add support for non-markdown syntax like _italic_ at some point, though it would be purely on the runtime side.
If you were to accept Russian characters, I'd suggest an "opt-out" approach where you say "allow all characters except space and these symbols" as opposed to the current "opt-in", which says to specifically allow a-z, 0-9, _. Unicode contains a lot of characters and many of them not a letters. It would be tough task to filter out all those non-letters to "opt-out" them from identifiers.
In fact it would be a good idea to make this an optional setting. Like in beginning of the script to put something like:
IDENTIFIERS а-я, А-Я
Which would completely solve my problem. :) Unfortunately I'm not much into the C# so I very unlikely will code it myself and offer a pull request...
Hello,
I've currently started working on this. I've considered the opt-out approach not to be a good idea, as one needs to enlist all the non-identifier characters, which is error prone since there are a vast number of non-identifiers in the unicode universe (or should I say the unicode multiverse). It would be easy to miss something and then get into a mess.
Instead I plan on implementing support for currated character ranges. The idea is that the author can explicitly allow a certain character range matching his language, rather than all possible characters. The ranges will be based on the Unicode table for the different cultures, some suitable exampes can be taken from here: http://jrgraphix.net/research/unicode_blocks.php.
In addition, @fireton's idea for manually including a given character range on-demand would seem reasonable, instead of having all the ranges precompiled or pre-allocated in memory. I am thinking of something like:
ENABLE CHRANGE "Cyrillic"
where the string would be unique name for the desired range. The supported ranges could be listed in documentation for reference. Also, this would allow for growing support of character ranges in the future, meaning we don't have to support all of them right away.
So, let me know if you like the idea.
PS: Please, ignore the stuff for the markdown support, it is something I want to work on, but does not feel relevant at all to the OP's issue here. Let's not pollute the character ranges topic with that, I'll make another issue / PR for the markdown support if someone is interested
Thanks, @ivaylo5ev! With this addition ink will be even better tool for all non-programmers authors!
Just to inform on the progress of this, I have been able to introduce several character ranges so far:
- Extended Latin A
- Extended Latin B
- Cyrillic
- Arabic
Soon to be defined:
Arabic- Hebrew
- Armenian
eventuallyGreek
The latter take some work as the original unicode ranges need to be further curated in order to discard non-letter characters. For some of these I cannot be of much use directly as I do not know the respective languages and I am relying on the assistance of some friends of mine.
In addition, my changes are currently based on another PR of mine which so far seems to be a long lived one and I will eventually port them to the most up-to-date master in order to deliver them faster.
Also, I will need a few more NUnit tests to verify the feature. If all turns well, I will be done in a couple of weeks, hopefully before the Christmas and New Year vacation days.
Cheers
@ivaylo5ev, any progress on this?
@fireton I've been having some personal matters these few months. I will try to complete this in a couple of weeks, mostly it needs some unit tests and documentation.
I am resuming work on this now. I am experiencing some issues with divert variables and divert names at the moment, which prevents some tests to pass. I need some time to check whether there is a deeper issue with ink itself or it is entirely caused by the new feature
I have now managed to prepare a PR for that. I apologize for the long delay on this anticipated feature. I did not expect that either, but I had personal issues that prevented me to properly focus on this one and complete it in an earlier time frame. I hope the PR is received well and merged to the mainstream codebase.
I tried inklecate version 0.9.0 and latin chars are not working for knots, for example:
-> começo
=== começo ===
Era uma vez...
-> END
I've started to work with "ink" just today, sorry if I missed something.
-- edit -- I build the latest version from this repo and it worked. I'm trying to use inky and didn't found how to test it with this build of inklecate.
@joethephish