petitparser2
petitparser2 copied to clipboard
Parsing binary data
I basically have a parser ready to parse text input. However another representation of the source is binary - although with the exact same AST structure. It seems that adding those two methods allow me to treat ByteArray
as byte sequences and Integers
as bytes:
ByteArray>>#asPParser
^ PP2LiteralSequenceNode on: self
Integer>>#asPParser
^ PP2LiteralObjectNode on: self
Thinking about it doing something like
SequencableCollection>>#asPParser
^ PP2LiteralSequenceNode on: self
would even allow parsing "numeric collection" in general ...
Is this the way to go?
Hi Udo,
not sure what is your goal. the mentioned asPParser methods allow you to do:
For ByteArray: 'foobar' asPParser parse: 'foobar'
I am not sure what exactly the Integer>>asPParser do. Can it be used as following? 'a' asInteger asPParser parse: 'a'
What kind of use case would you like to add?
Hi Kursjan,
the generic idea is to be able to parse binary data (given as ByteArray
) where each element is an Integer
(byte). I can't disclose the protocol I work on (NDA) but I think WebAssembly is a good example.
E.g. the text format (p. 132) defines
For example, the textual grammar for value types is given as follows:
valtype ::= ‘i32’ ⇒ i32 | ‘i64’ ⇒ i64 | ‘f32’ ⇒ f32 | ‘f64’ ⇒ f64
E.g. the binary format (p. 114) defines
For example, the binary grammar for value types is given as follows:
valtype ::= 0x7F ⇒ i32 | 0x7E ⇒ i64 | 0x7D ⇒ f32 | 0x7C ⇒ f64
However once the valtype
token has been parsed all the higher level combination rules work exactly the same.
So the basic idea would be for PP to be able to parse binary literals by adding ByteArray>>#asPParser
and Integer>>#asPParser
. This would allow to define a WASMTextParser
as subclass of PP2CompositeNode
with
valtype
^ ('i32' asPParser / 'i64' asPParser / 'f32' asPParser / 'f64' asPParser) ==> [:type | WASMValtypeNode type: type]
WASMTextParser
would then define all the production rules on top of this valtype
definition.
And in WASMBinaryParser
(as subclass of WASMTextParser
) would simply overwrite valtype
as
valtype
^ (16r7F asPParser / 16r7E asPParser / 16r7D asPParser / 16r7C asPParser) ==> [:type | WASMValtypeNode type: type]
However all the higher level production rules in the superclass would still work.
So the only difference here would be how to parse literals - string on one hand (as usual) but also binary (what I proposed).
Does that help?
Hi Udo,
did I get it right that WASMTextParser
should already work?
valtype
^ ('i32' asPParser / 'i64' asPParser / 'f32' asPParser / 'f64' asPParser) ==> [:type | WASMValtypeNode type: type]
String>>asPParser
is already defined and would create a LiteralSequence
parser.
Your proposal of extending ByteArray, Integer with asPParser
sounds pretty much OK and aligned with the current PetitParser design. How would the extension look like? Something along these lines?
Integer>>asPParser
^ PP2LiteralObjectNode on: (Character from: self)