nimskull icon indicating copy to clipboard operation
nimskull copied to clipboard

transition lexer to the DOD implementation

Open haxscramper opened this issue 3 years ago • 5 comments

Change lexer to use data-oriented design for tokens instead of the current solution. Remove nim"pretty" hack-ins from all over the compiler and the nim"pretty" itself - it should be implemented as a standalone tool that can actually understand code layout requirements instead of an ad-hoc kludge that was immured deeply in the lexer and parser source code.

  • General lexer changes
  • [X] Normalize token kinds for the literal bases - instead of introducing a separate data channel in form of the NumericalBase enum more token kinds were added.
  • [ ] Rewrite base lexer implementation to operate on the whole string at once instead of working on the fixed-size buffer and copy-pasting character pieces into the .literal field in the Token (I wonder how many thousands of heap relocations this solution were causing in each simple run)

haxscramper avatar Sep 05 '22 11:09 haxscramper

First run uncovered tons of issues with the implementation (who could have guessed, amirite?) so this is going to be a pretty ugly PR all around, I can tell this right away.

haxscramper avatar Sep 05 '22 11:09 haxscramper

I think it'd make sense to split the nimpretty related removals into a separate PR. This would reduce the surface area of this one and thus make it easier to review.

zerbina avatar Sep 05 '22 18:09 zerbina

I think it'd make sense to split the nimpretty related removals into a separate PR. This would reduce the surface area of this one and thus make it easier to review.

It is not possible because nim"pretty" is implemented as one large hack around lexer+parser, I can't rework Token and keep the nim"pretty" working at the same time only to remove it later due to all the hacks it had built-in. And this is a clean removal, so it hardly adds anything to the review complexity.

haxscramper avatar Sep 07 '22 08:09 haxscramper

It is not possible because nim"pretty" is implemented as one large hack around lexer+parser, I can't rework Token and keep the nim"pretty" working at the same time only to remove it later due to all the hacks it had built-in.

I should have made myself clearer there, sorry. What I meant to propose was to:

  1. Split off the nimpretty removal into a separate PR and merge it first
  2. Rebase this PR onto devel and continue with it

A separation by topic is possible here, which is where my suggestion came from. I agree that reworking Token while having to keep nimpretty functional just to remove it after doesn't make sense.

And this is a clean removal, so it hardly adds anything to the review complexity.

One still has to figure out which part (nimpretty removal or Token rework) each change belongs to (although, as you said, it's very much manageable with the changes here), and also has to do so again when looking at the resulting commit in the future. With two separate commits, it'd be easier.

Anyways, it's not something I'd consider to be a blocker.

zerbina avatar Sep 07 '22 15:09 zerbina

Preemptive nimpretty removal makes sense, I will PR the change.

haxscramper avatar Sep 07 '22 16:09 haxscramper

Closing this PR for the time being -- conflicts started to pile up and IMO this can be re-opened when needed, right now it is just a bit confusing to have it around. Maybe it will make more sense to implement this from scratch.

haxscramper avatar Jun 21 '23 18:06 haxscramper