rails icon indicating copy to clipboard operation
rails copied to clipboard

Add an adapter to decouple Nokogiri

Open jamis opened this issue 1 year ago • 5 comments

Document nodes are opaque outside of the adapter. This lets us drop in different document models as necessary.

Summary

Other Information

jamis avatar Jun 08 '22 15:06 jamis

Hey @jamis, I wanted to ask if the reason for these changes is solely to avoid Nokogiri? If so, is there some actionable feedback you can provide to me (as the maintainer)? It's still actively maintained and if there's an issue I'd love to understand how to improve it.

flavorjones avatar Jun 17 '22 17:06 flavorjones

Hey @flavorjones, this is mostly to facilitate some optimizations we implemented at 37signals to allow faster parsing and processing of emails in HEY. We have a library that we developed internally (which we may or may not open source eventually) that manages to do certain operations quite a bit faster. One of the biggest limiting factors of Nokogiri (not in general, just for our own usage) is that we needed to duplicate a lot of nodes when rewriting document trees with Nokogiri. Our internal tool lets a node belong to multiple trees simultaneously (which it manages by requiring nodes to be immutable). Also, when we wrote our tool, Nokogiri didn't have HTML5 support, so we built it against libgumbo directly. I know that's changed in Nokogiri now, and we've experimented with using Nokogiri as the parser for our tool.

I don't know that the "nodes are immutable" aspect is something you could (or should) add to Nokogiri, as it seems like it would break a lot of existing assumptions, but if you'd like to talk about it more, I'd be happy to discuss it.

jamis avatar Jun 17 '22 18:06 jamis

@jamis Thanks for that explanation. Without seeing your code, I can't be absolutely sure, but libxml2's internal DOM structure and memory management conventions make it challenging to implement something like nodes that belong to multiple documents, or nodes that have multiple parents. So, maybe it's just not the best tool for this job, and that's OK.

But let me know if there are situations for which we might be able to optimize, I'm always up for shooting the shit and experimenting.

flavorjones avatar Jun 17 '22 19:06 flavorjones

Some of the Node API indirection in here feels rather unfortunate -- especially Document.node_attribute. I understand that mutations are insurmountable, but for readonly traversal, can we instead expect some baseline API on node objects?

matthewd avatar Jul 06 '22 06:07 matthewd

The ActionText::Document API is only intended to be used internally by ActionText (or, perhaps, by libraries that extend ActionText). For clients, they will know what backend they are using (even if it is just the default Nokogiri backend) and can treat nodes with whatever level of transparency they desire.

Within ActionText, though, I feel like it is safest to treat node objects as fully opaque, without any assumptions about what API they use. Still, I agree that the Document.node_attribute method (as one example) is inelegant, and I'd definitely be open to suggestions for how to do this more elegantly, without sacrificing that opacity.

jamis avatar Jul 07 '22 20:07 jamis

Looking ready for merge. @Jamis could you rebase and add a mention to the changeling?

jeremy avatar Sep 06 '22 18:09 jeremy

What if the Document::Nokogiri and others returned an opaque Document::Node object with the necessary API for internal Action Text usage (and possibly library use)? The adapter would be responsible from translating their node to the ActionText node. But at least the existence of Document would not leak to so many files.

rafaelfranca avatar Sep 13 '22 01:09 rafaelfranca

@rafaelfranca -- thanks for the thoughts about this. I've also considered having an opaque wrapper around the nodes, to further abstract that interface, but my initial experiments there found it to devolve into an aggressive game of whack-a-mole; the node objects are exposed in a variety of different ways, including as lists of nodes. In the end, I opted to just treat the returned nodes as opaque, without defining what they really need to be.

At any rate, though, the added complexity of this adapter approach has led us to think about some alternatives at 37signals, and since this decoupling of Nokogiri is unlikely to help many other people, we're opting to simply close this PR for now.

jamis avatar Sep 13 '22 16:09 jamis