petitparser2 icon indicating copy to clipboard operation
petitparser2 copied to clipboard

Migrate PetitPreprocessor

Open jecisc opened this issue 4 years ago • 7 comments

In PetitParser 1, a tool named "PetitPreprocessor" was added. The goal is to be able to preprocess the input to remove things such as comments in order to make the grammar to write easier while conserving the right positions in the input.

I think it would be nice to add it to PetitParser2.

I check if it was easy to migrate but it is not straightforward. PetitPreprocessor use the fact that PP1 streams are positionable while they are not in PP2. I don't have the time now to check how to update it to work with PP2.

https://github.com/moosetechnology/PetitParser/tree/development/src/PetitPreprocessor https://github.com/moosetechnology/PetitParser/tree/development/src/PetitPreprocessor-Tests

jecisc avatar Mar 17 '20 11:03 jecisc

If someone with more knowledge on PetitParser2 is willing to help it would be much appreciated :)

jecisc avatar Mar 17 '20 11:03 jecisc

I can have a look into it. Nevertheless, I have to say that PP2Stream is a positionable stream though. What is the issue you had with migrating?

kursjan avatar Mar 18 '20 08:03 kursjan

I mean that the stream itself does not know it’s current position and cannot update it’s current position. Or I missed something maybe.

jecisc avatar Mar 18 '20 09:03 jecisc

https://github.com/kursjan/petitparser2/blob/master/PetitParser2/PP2Stream.class.st

{ #category : #'context interface' }
PP2Stream >> atPosition: position [
	^ collection at: position
]

You probably just need to expose the position instvar.

kursjan avatar Mar 18 '20 09:03 kursjan

Oh, got it. Position in not instvar of the PP2Stream but PP2Context. I will have to check how the preprocessor works and what does it do...

kursjan avatar Mar 18 '20 17:03 kursjan

I’ll try to explain what I know of it later this evening if I think about it ;)

jecisc avatar Mar 18 '20 17:03 jecisc

From what I know: PetitPreprocessor allows one to preprocess the input in order to remove things to parse. This allows one to make the parser easier to write.

Let's say for example that I want to detect some code duplication in code. Then I don't care about the comments but I don't want to manage them in my parser.

For example in a parser I have:

start
	^ (controlStructure / comment / water) plus preProcessor: (comment ==> [ :p | '' ])

There are two kinds of preprocessors currently.

A parser stream one acting like this:

testBiggerReplacementThanMatching
	preProcessingParser := 'Troll' asParser preProcessor: 'u' asParser ==> [ :p | 'll' ].
	self assert: (('Un' asParser , preProcessingParser , 'DeTroy' asParser) end matches: 'UnTrouDeTroy')

And a regex one acting like this:

testDecomposedEntryConsumed
	preProcessingParser := 'Libellule' asParser preProcess: 'T' asRegex into: ''.
	self assert: (preProcessingParser , 'yoyo' asParser matches: 'LibTelTluTleyoyo')

About how it works, from what I have seen, it introduces a class PPRelativePositionStream. This stream wrapped a PPStream and knows transformations. Then it will be able to say where we are in the PPStream when we apply the transformations. The transformations being what was changed in the code by the preprocessor.

It also introduces a PPInfo class that return information around the parsing such as the start and stop position. This is useful in Moose for example to create the source anchors representing the position of elements in files.

jecisc avatar Mar 18 '20 20:03 jecisc