rust-libxml icon indicating copy to clipboard operation
rust-libxml copied to clipboard

Add c14n for node and document

Open saks opened this issue 1 year ago • 4 comments

There are not too many choices when it comes to "Canonical XML" implementation for rust. This PR is a compilation of ideas, approaches and test files from the following projects:

  • https://gitlab.com/reinis-mazeiks/xml_c14n
  • https://github.com/sparklemotion/nokogiri

Work items

  • [x] add canonicalize method for Node and Document
  • [x] add Node::ancestors method (I personally find it very useful when using "nokogiri", plus it help in tests a lot). Although I don't feel too strongly about, can remove if it doesn't fit.
  • [x] add Node::at_xpath method. Yes, it is similar to findnodes (and can ultimately be implemented using Context), but with design from "nokogiri", I find it a lot more flexible when navigating a tree. Same as for the previous item, can update PR to remove it if necessary.

Any feedback is welcome! I don't feel particularly attached to any approach here. Please let me know how to make this PR better.

saks avatar Sep 06 '24 05:09 saks

@saks interesting, there is a very high-level question I have first.

Why do you think this wrapper crate is the right place to introduce this method? rust-libxml mostly tries to follow the libxml2 coverage, and probably should not grow much further, as a general-purpose XML utility library.

As you mention xml_c14n exists as a separate Rust crate. Maybe your work could also be packaged as a separate crate?

dginev avatar Sep 06 '24 10:09 dginev

c14n is a part of the standard libxml2 featureset, so it would be nice to have this functionality. This MR does a bit more, but from my reading of the readme it should be covered:

Only covers a subset of libxml2 at the moment, contributions are welcome. We try to increase support with each release.

I'd like to have this for some validation in tests where I compare the output of some rust code against a canonicalized result of an xpath query. It's doable with this crate and xml_c14n, but having it as part of rust-libxml would remove the need to serialize the libxml document and then parse it as a libxml document directly afterwards.

tstenner avatar Dec 22 '24 10:12 tstenner

Having independent intrerest is certainly helpful, as is pointing out that c14n can be fully thought of as exposing libxml2 functionality. So I am feeling a lot more inclined to merge the PR - I will make some time to review it in more depth soon.

Would you like to also add a review for this PR @tstenner ? Given you are an active user of the feature, you may have good visibility of any edges left to iron out.

dginev avatar Dec 23 '24 10:12 dginev

@dginev Thanks for reconsidering it. Overall, I only had one obvious improvement to suggest and an opinion regarding the callback parameter (but then again, I don't use it, so it doesn't matter to me that much). The output buffer handling could be shorter, but that is something you know more about than me.

tstenner avatar Dec 23 '24 11:12 tstenner

I've been using this branch on an off for the last half year and haven't encountered any problems with it.

I know it's a bit bigger MR and not the feature you're maintaining the wrappers for, but if you could find the time to review and merge it, I could go back to the upstream releases and make it easier for my project collaborators.

tstenner avatar Jul 03 '25 07:07 tstenner

@tstenner The time is approaching for another release, so I would like to integrate the new features here.

As you have reviewed this in detail, can I summarize that you would be comfortable seeing this merged as-is? I will do a cursory review, but I wouldn't have time to thoroughly reconsider anything. If @saks has time to address the feedback so far, that would be great, otherwise we should probably refine in a follow-up PR?

dginev avatar Jul 08 '25 21:07 dginev

I have merged here to signal strongly that we should include the feature. Further feedback is most welcome, especially in the form of follow-up PRs. I will probably submit the release on Sunday - and switch away from the xmlChar return type.

dginev avatar Jul 08 '25 21:07 dginev