lol-html Rewriter reuse example?

I'm trying to understand how and if users are intended to express a set of element handlers to be reused on multiple inputs, across multiple threads. In particular, I'm interested in applying a given transformation to a bunch of html byte streams, but the lifetimes and borrows in the public interface make it difficult to see how to achieve that. Is there example code out there, or a part of the documentation that's relevant to this usage pattern?

Thanks!

Mar 16 '21 23:03 jbr

Hmm, this seems to work fine as-is - I ran examples/mixed_content_rewriter on this webpage twice with the same rewriter and it produced two copies of the output. Is there something in particular giving you trouble?

Dec 17 '21 18:12 jyn514

The issue is when a user might want to define the rewriting rules on one thread and use them on another (or several others) — because even the rule definitions are !Send, it's unclear how to use this crate in a multithreaded context, or even a 'static context in which the rules need to be owned somewhere other than where the end user defines them.

An example. The workaround I ended up settling on was to have the user provide a Fn() -> Settings<'static, 'static> which is run on each thread. It's an awkward interface, but I imagine the end result of this issue is that lol-html is not the appropriate library to use for an async multithreaded html rewriter

It's also worth noting that at the time of filing this, #69 had not yet been released

Dec 18 '21 02:12 jbr

It's true that HtmlRewriter is !Send and !Sync, yes. I'm looking into whether it's possible to remove the Rc<RefCell> around the output sink.

Dec 21 '21 20:12 jyn514

Oh hmm, I missed that Settings is also not Send or Sync, that seems especially unfortunate. I think it would be a breaking change to fix that unfortunately ... the issue is that ElementContentHandlers takes a user-provided function and we don't require that to be Send+Sync.

Dec 21 '21 20:12 jyn514

@jbr hmm, I'm confused though - how would it be useful to share the rewriter between threads? How would you synchronize writing the input so you know the output makes sense?

Dec 21 '21 20:12 jyn514

There are a few distinct concerns here, and I'm not entirely sure that any of them are specific suggestions or feature requests

for any multithreaded application

A user defines 'static Settings on one thread and hands it to a library, which then is able to send the Settings or clones of the Settings to other threads or an Arc of the settings to other threads, which can then use those same settings in their own rewriters. The simplest version of this would create a new rewriter for each http request and drop it at the end of rewriting. Depending on how cheap creating rewriters is, it might make sense to use some sort of rewriter pool, either thread-locally or shared. This seems relevant to sync rust as well as async.

async use specifically

Ideally the rewriters would also be Send but I imagine that introduces unacceptable performance costs to the other users of this crate. The motivation behind the Send bound is that use with most of the async ecosystem requires being able to move a Future that contains the rewriter between work-sharing executor threads. Most async runtimes (tokio, smol, async-std) also offer local (non-work-sharing) executors, which is how lol-async works at all, but the Send bound is prevalent in async contexts, and spawn_local is unstable on async-std.

Dec 22 '21 21:12 jbr

Ah, so for rewriters you actually only need them to be Send, not Sync?

Dec 23 '21 23:12 jyn514