Parsing number edge cases (kicad_pcb flavoured s-exp)
I'm running into two cases regarding number parsing where i'm unsure how to proceed:
First is (host pcbnew 5.1.6). This results in 5.1 as a float and .6 as a string.
The second is (tedit 5F199470) and (tstamp 5EFF45FA) which are unix timestamps but without prefixes which would indicate the radix. All the other fields parse just fine in the file i tested, so it's just those three edge-cases.
I'm still learning rust and would like to implement this myself, but i'm unsure what the best approach would be to address this because both cases depend on the preceding tokens.
I'm running into two cases regarding number parsing where i'm unsure how to proceed:
First is
(host pcbnew 5.1.6). This results in5.1as a float and.6as a string.
What number should this parse as? A link to documentation describing kicad_pcb syntax would come in handy, if support for them should be added to lexpr.
The second is
(tedit 5F199470)and(tstamp 5EFF45FA)which are unix timestamps but without prefixes which would indicate the radix. All the other fields parse just fine in the file i tested, so it's just those three edge-cases.
So does kicad use radix 16 (hexadecimal) by default?
I'm still learning rust and would like to implement this myself, but i'm unsure what the best approach would be to address this because both cases depend on the preceding tokens.
It also depends on what other deviations from "regular" s-expression kicad syntax contains; I'd like to avoid incomplete support.
The 5.1.6 should parse as a string, as it's not further defined what the version number format should look like. The only documentation there is as of now is unfortunately a bit outdated: https://kicad.org/help/legacy_file_format_documentation.pdf#%5B%7B%22num%22%3A134%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C59.9%2C493.3%2C0%5D
The tedit, tstamp and visible_elements are the only cases where KiCad uses radix 16. The other numbers in the file i tried it with parse just fine. The only other hex number i encountered was with (layerselection 0x010fc_ffffffff). Other numbers are all integers or floats which parse fine.
I'll try to build a minimal test case which should cover all the possible elements in a kicad_pcb file.
I have some very basic code here https://github.com/tachiniererin/lexpr-rs/commit/778b67b2d5d5a0c1078ae7edda5bf773c8da4abb where i tried to work around it. It's tested with this file: https://raw.githubusercontent.com/tachiniererin/kicad-rs/main/ferret.kicad_pcb
edit: better link to the kicad_pcb file format documentation
I checked some more files and the linked documentation. The layerselection is the one thing where old (unknown, probably radix 10 integer) and new (hex format prefixed with 0x with underscore to separate lower 4 bytes) formats diverge. Apart from that I couldn't find any more syntax quirks apart from the ones already listed.
Hello @rotty and @tachiniererin, what's the status of the kicad related code? Is it merged yet ? Does it need further testing?
Sorry for leaving this unanswered for so long. I've now come to the conclusion/decision that supporting kicad's "flavour" of S-expressions is out of scope for lexpr, at least in the way the code linked by @tachiniererin does it. Having context-dependent syntax for number literals is really awful IMO, and I decline to include to support for that.
What I could imagine supporting would be some kind of "allow strange numbers" mode that would parse all of kicad's number syntaxes as symbols, by relaxing the symbol syntax to include all of kicad number notations. Note however, that this would then parse all numbers as symbols, and it would be up to the user of lexpr to context-sensitively parse the symbols into numbers themselves. I'm not motivated to implement that myself, as I'm unsure about the usefulness of that feature, due to the burden it puts on the user, and the potential inefficiency inflicted.
The format has changed quite a bit in the meantime and the weird numbers aren't an issue anymore fortunately. Unfortunately, there's the same issue now with UUIDs which appear plainly in the expressions like this (tstamp 74471dee-5f8c-4f2f-a8b3-b698196d2e19) or the converted format from the old syntax (tstamp 00000000-0000-0000-0000-00005eff3d45). The latter technically isn't a valid UUID but that's another issue.
While the easy way out would be to parse those as 128bit integers, it would need special provisions again. The least invasive change i can see would be to implement a special path in parse_num_literal and the symbol parsing path.
But I can imagine that that's not an option you'd want to support, it's an ugly hack.