nimskull
nimskull copied to clipboard
transition lexer to the DOD implementation
Change lexer to use data-oriented design for tokens instead of the current solution. Remove nim"pretty" hack-ins from all over the compiler and the nim"pretty" itself - it should be implemented as a standalone tool that can actually understand code layout requirements instead of an ad-hoc kludge that was immured deeply in the lexer and parser source code.
- General lexer changes
- [X] Normalize token kinds for the literal bases - instead of
introducing a separate data channel in form of the
NumericalBaseenum more token kinds were added. - [ ] Rewrite base lexer implementation to operate on the whole string
at once instead of working on the fixed-size buffer and copy-pasting
character pieces into the
.literalfield in theToken(I wonder how many thousands of heap relocations this solution were causing in each simple run)
First run uncovered tons of issues with the implementation (who could have guessed, amirite?) so this is going to be a pretty ugly PR all around, I can tell this right away.
I think it'd make sense to split the nimpretty related removals into a separate PR. This would reduce the surface area of this one and thus make it easier to review.
I think it'd make sense to split the
nimprettyrelated removals into a separate PR. This would reduce the surface area of this one and thus make it easier to review.
It is not possible because nim"pretty" is implemented as one large hack around lexer+parser, I can't rework Token and keep the nim"pretty" working at the same time only to remove it later due to all the hacks it had built-in. And this is a clean removal, so it hardly adds anything to the review complexity.
It is not possible because nim"pretty" is implemented as one large hack around lexer+parser, I can't rework
Tokenand keep the nim"pretty" working at the same time only to remove it later due to all the hacks it had built-in.
I should have made myself clearer there, sorry. What I meant to propose was to:
- Split off the
nimprettyremoval into a separate PR and merge it first - Rebase this PR onto
develand continue with it
A separation by topic is possible here, which is where my suggestion came from. I agree that reworking Token while having to keep nimpretty functional just to remove it after doesn't make sense.
And this is a clean removal, so it hardly adds anything to the review complexity.
One still has to figure out which part (nimpretty removal or Token rework) each change belongs to (although, as you said, it's very much manageable with the changes here), and also has to do so again when looking at the resulting commit in the future. With two separate commits, it'd be easier.
Anyways, it's not something I'd consider to be a blocker.
Preemptive nimpretty removal makes sense, I will PR the change.
Closing this PR for the time being -- conflicts started to pile up and IMO this can be re-opened when needed, right now it is just a bit confusing to have it around. Maybe it will make more sense to implement this from scratch.