alice-rs
alice-rs copied to clipboard
WIP: Several changes
Hi @cbourjau, I found alice-rs when I was looking to write a little command-line TBrowser-like tool in Rust. Sadly it wasn't able to open my root files, and long story short, here's a big chunk of changes:
- Make sure the projects compile on newer rust versions (in particular, this means not using rustfmt as a library, as it doesn't seem to compile on modern nightlies. We now assume that rustfmt is provided as a tool by the system)
- Version bumps of several libraries, and get rid of several unnecessarily specific patch-level version pins.
More importantly, in root-io, where I did most my work:
- Bump nom to v7
- That means getting rid of the macros and using the function-based parsers instead
- Include nom-supreme for its postfix combinators
- Semantic error types
- Pretty printing of errors! Look at it:
While trying to parse tbranch object:
00000180 00 01 0b 40 00 02 35 00 0a 40 00 01 e6 00 0c 40 [email protected]..@..�..@
^~~~~~~~~~~~~~~~~~~~~~~~~~
00000190 00 00 1a 00 01 00 01 00 00 00 00 03 50 00 00 06 ............P...
000001a0 72 75 6e 4e 75 6d 06 72 75 6e 4e 75 6d 40 00 00 runNum.runNum@..
000001b0 06 00 02 00 00 03 e9 00 00 00 01 00 00 a0 00 00 ......�......�..
000001c0 00 00 00 00 00 00 01 00 00 00 00 00 00 00 0a 00 ................
000001d0 00 00 00 00 00 00 0a 00 00 00 62 00 00 00 00 00 ..........b.....
000001e0 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000001f0 00 00 73 00 00 00 00 00 00 00 63 40 00 00 15 00 ..s.......c@....
00000200 03 00 01 00 00 00 00 03 00 00 00 00 00 00 00 00 ................
00000210 00 00 00 00 40 00 00 6e 00 03 00 01 00 00 00 00 [email protected]........
00000220 03 00 00 00 00 00 00 00 01 00 00 00 00 40 00 00 .............@..
00000230 55 ff ff ff ff 54 4c 65 61 66 45 6c 65 6d 65 6e U����TLeafElemen
00000240 74 00 40 00 00 40 00 01 40 00 00 32 00 02 40 00 t.@..@[email protected]..@.
00000250 00 1a 00 01 00 01 00 00 00 00 03 00 00 00 06 72 ...............r
00000260 75 6e 4e 75 6d 06 72 75 6e 4e 75 6d 00 00 00 01 unNum.runNum....
00000270 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000280 00 01 00 00 00 0d 40 00 00 1d 00 03 00 01 00 00 ......@.........
[0x135 bytes omitted]
Expected EOF, but found excess data
00000370 00 00 00 14 62 61 63 6f 6e 68 65 70 3a 3a 54 45 ....baconhep::TE
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00000380 76 65 6e 74 49 6e 66 6f 14 62 61 63 6f 6e 68 65 ventInfo.baconhe
00000390 70 3a 3a 54 45 76 65 6e 74 49 6e 66 6f 00 cd 50 p::TEventInfo.�P
000003a0 8e 09 00 07 00 00 00 01 00 00 00 00 00 00 00 0d �...............
000003b0 00 00 00 00 00 00 00 00 00 00 00 00 40 00 02 30 [email protected]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is some errors in the test-suite I haven't managed to get rid of yet though, so this PR remains as a WIP for now. Could you point me at the actual documentation of the root file format? I haven't been able to find it online.
Cool! I'll take a closer look in the next few days. Concerning documentation I had the same problem back in the days. I recommend:
- This post of me asking the same question: https://root-forum.cern.ch/t/documentation-for-roots-file-format/26260/2
- The groot project (links from the above thread are dead; the projects go merged into one repo): https://pkg.go.dev/go-hep.org/x/hep/groot?utm_source=godoc
@strangeglyph I finally found the time to take a closer look at this PR. Thanks again! Clearly, there is a lot going on in this PR and I think it would be easier if it were broken up in smaller chunks that are known to work. Would it be possible to break it up into the following:
- [x] Replace
rustfmt
with prettyplease - [x] Remove version pins
- [ ] Update nom to v7 and remove macros (might be a smoother transition to convert macros to functions on the current version an then update)
Once that is done and known to work we can look at the next steps. If we touch the error handling of this project it, I think it would be nice to replace the failure
crate completely with the current best practices in error handling (many of the failure
features have made it to the standard library in the mean time). I have to admit that I am not completely up to date in this regard, but I believe thiserror
fits that bill?
The postfix operators of nom-supreme
seem nice, but I am hesitant to pull in a dependency with fairly small adoption in the wider ecosystem just for that. Is nom-supreme
also needed for the pretty error messages?
Is it reasonable to break up this PR along those lines?
Edit: I converted this to a bit of a tracking issue. Your comments about failure/thiserror make perfect sense!
Breaking up the PR absolutely makes sense, I'll get around to doing that soon. prettyplease
looks like a good find, I'll see if it can be easily integrated.
I have tried to replace failure
everywhere I could find it in root-io. I think it's reasonable to replace it in the other crates as well. thiserror
is what I've used in root-io so far and I believe it's the current standard for library-level errors. Speaking of, might be worth splitting this whole repo into several, one per crate at some point?
nom-supreme
is only needed for the postfix combinators. I was looking into its error types as well, but they aren't particularly suited to what we want (in particular, some error types are only implemented over string inputs, not bytes). However, I can only stress how much the postfix notation improves readability. In my opinion, it would be worth keeping it in.
Are you still interested in pushing this PR forward? I think it would be very nice to see these changes land and I am also warming up to nom-supreme
:D
Ah, yes, sorry. I've been a bit busy with life the last couple of weeks but I should hopefully find more time for this project again soon.
Got around to investigating these bugs a bit more. Looks like we don't actually parse everything from a tbranch blob? I just verified this happened before the nom upgrades. Is this intended behavior? I had wrapped the call in tbranch_hdr
with all_consuming
, but if this isn't actually intended to be read in its entirety, I'll remove it.
I was actually just looking into this, too! I think it would be nice to use eof
or all_consuming
as much as possible, but as it happens, there are sometimes unexpected paddings or mystery bytes in the ROOT format which make this difficult...
I think I wrapped all_consuming
everywhere where we pass around Raw
buffers and the like. I'll remove it from tbranch_hdr
for now and make a note on tbranch
(and open an issue?) about the case of the mysterious bytes.
So, this version passes the test suite and as far as I can tell matches the behavior of the current master branch. No clue why the AppVeyor build is failing. The entire output just seems to be a big diff, but I can't see an error message.
Sorry for the late reply. sounds like a rustfmt issue. I'll take a look!
Alright! things were indeed just a cargo fmt
to get everything formatted. That said, I am afraid that this PR is still too large and has a few things where I am not quite sure about their value given their complexity. Would you be interested in spinning out the nom update thiserror
error handling (without nom-locate
and ouroboros
) into a separate PR?
Glad to hear that it was just cargo fmt
- I'll remember that in the future. Let me make sure I understand you correctly, you'd like two PRs:
- PR 1 contains the
nom
v7 updates and replacesfailure
withthiserror
- PR 2 has the improved error reporting with
nom-locate
?
Yes, pretty much. I have to say that I am not quite sold on the merits of nom-local
and ouroboros
, though (given their added complexity and dependencies). This is one of the reasons why I think it would be nice to see them in isolation. Just by the way: are you aware of dbg_dmp
in nom? It is what I usually use for debugging. Certainly, it is nicer to automatize printing such an output when something goes wrong, though.