nom
nom copied to clipboard
Example parsers
We currently have a few example parsers. In order to test the project and make it useful, other formats can be implemented. Here is a list, if anyone wants to try it:
-
text file formats:
- [x] INI
- [x] FASTQ
- [x] libconfig-like configuration file format
- [x] torrc configuration file
- [x] ISO 8601 dates
- [x] Web archive
- [x] TOML
- [x] bencode
- [x] CSV
- [ ] YAML
- [ ] CommonMark
- audio, video and image formats:
-
document formats:
- [x] torrent files
- [x] TAR
- [ ] MS-CFB (compound format, used in doc, xls, ppt, cab, msi files)
- [ ] GZ
- [ ] ZIP
- [ ] RAR
- [ ] binary PLIST
-
database formats:
- [x] Redis database files
- [x] Ceph crush maps
- network protocol formats:
-
executable formats:
- [ ] Portable executables (PE)
- [ ] ELF
- [ ] GameBoy ROM
-
crypto related:
- [x] ASN.1
- [ ] X.509 certificates
- [ ] DER public and private keys
- [ ] SSL/TLS packets
- [ ] OpenPGP
-
Programming Languages
- [x] Rust
- [ ] Lua
- [ ] Python
- [ ] C
- interface definition formats:
I'm writing a Thrift library for Rust that'll use Nom for both their IDL and the network protocol, so that can be another example (although in a different repo).
Nice idea, that will be useful! Please notify me when it is done, I will add a link in this list.
This looks interesting. Is anyone actively working on any of these parsers? I'd like to work on a few of these.
I have some code for a GIF one at https://github.com/Geal/gif.rs but it is hard to test, since the graphical tools in Piston change a lot.
You can pick any of them. Network packets may be the easiest, since they don't require a decompression phase.
I am using the gif example to see what kind of API can be built over nom. Most of the parsing example are done as one pass over the data, but often there is some logic on the side, and it is not easy to encode correctly.
I've started a fastq (http://en.wikipedia.org/wiki/FASTQ_format) parser https://github.com/elij/fastq.rs
@elij this is a great idea! Was it easy to do?
yup it's a great framework -- though I struggled a bit with eof so I borrowed some code from rust-config (https://github.com/elij/fastq.rs/blob/master/src/parser.rs#L69) -- is there a better solution?
yes, eof should be a parser provided by nom, I am just waiting for @filipegoncalves to send a PR :wink:
Hah, sorry for my silence. I've been busy lately. I just sent a PR (https://github.com/Geal/nom/pull/31).
I will be working on one of these example parsers as soon as I get some spare time. There are some great ideas in here!
I might give tar a try
Does this check off PCAP?
https://github.com/richo/pcapng-rs
pcap-ng and pcap are two different formats, right? It seems the consensus now is to move everything to pcap-ng, though.
I will try a FLAC parser, need to add quite a few things for it though.
ISO8601 is done in https://github.com/badboy/iso8601 (I hope it's mostly correct.)
ok, it should be up to date. More to come :smile:
WARC file format released. https://crates.io/crates/warc_parser
@sbeckeriv great, thanks!
It might be informative to try parsing the rust grammar with nom, if nobody has yet. In any case, I'd like to see a few programming languages on that list, since that's my use case.
@porglezomp programming languages examples would definitely be useful, but the Rust grammar might be a bit too much for the first attempt. Which other languages would you like to handle?
Yeah, I'm aware of the scale problem of Rust. I don't want to write that one, but I think it's a good holy grail for any parser library written in Rust. I'd like to try parsing the Lua grammar first, I think.
I recommend adding to the list:
-
Programming Languages
- Rust
- Lua (I'll do this)
- Python (or some other whitespace significant language)
- C
ok, I added them to the list :)
You have INI marked as done; do you have a link to it? (I'd love to use this for some tooling I'm hoping to build in 2016; need a good non-trivial example for it, though.)
@chriskrycho: https://github.com/Geal/nom/blob/master/tests/ini.rs
Thanks very much, @badboy!
I'll try to make the TOML parser very soon.
Actually, I think I should rewrite that INI parser, now that more convenient combinators are available. Also, I should really work on that combinator for space separated stuff
@fbernier great! Please keep me posted!
Maybe add a simple example for trailing commas in lists? Python has those, but is quite complex. Can't think of a simple example though.
That IRC example is no longer using nom. The parser was moved into its own repository: https://github.com/Detegr/RBot-parser
@l0calh05t to parse something like [a,b,c,]
or [a,b,c]
?
@johshoff fixed, thanks