Rewriter reuse example?
I'm trying to understand how and if users are intended to express a set of element handlers to be reused on multiple inputs, across multiple threads. In particular, I'm interested in applying a given transformation to a bunch of html byte streams, but the lifetimes and borrows in the public interface make it difficult to see how to achieve that. Is there example code out there, or a part of the documentation that's relevant to this usage pattern?
Thanks!
Hmm, this seems to work fine as-is - I ran examples/mixed_content_rewriter on this webpage twice with the same rewriter and it produced two copies of the output. Is there something in particular giving you trouble?
The issue is when a user might want to define the rewriting rules on one thread and use them on another (or several others) — because even the rule definitions are !Send, it's unclear how to use this crate in a multithreaded context, or even a 'static context in which the rules need to be owned somewhere other than where the end user defines them.
An example. The workaround I ended up settling on was to have the user provide a Fn() -> Settings<'static, 'static> which is run on each thread. It's an awkward interface, but I imagine the end result of this issue is that lol-html is not the appropriate library to use for an async multithreaded html rewriter
It's also worth noting that at the time of filing this, #69 had not yet been released
It's true that HtmlRewriter is !Send and !Sync, yes. I'm looking into whether it's possible to remove the Rc<RefCell> around the output sink.
Oh hmm, I missed that Settings is also not Send or Sync, that seems especially unfortunate. I think it would be a breaking change to fix that unfortunately ... the issue is that ElementContentHandlers takes a user-provided function and we don't require that to be Send+Sync.
@jbr hmm, I'm confused though - how would it be useful to share the rewriter between threads? How would you synchronize writing the input so you know the output makes sense?
There are a few distinct concerns here, and I'm not entirely sure that any of them are specific suggestions or feature requests
for any multithreaded application
A user defines 'static Settings on one thread and hands it to a library, which then is able to send the Settings or clones of the Settings to other threads or an Arc of the settings to other threads, which can then use those same settings in their own rewriters. The simplest version of this would create a new rewriter for each http request and drop it at the end of rewriting. Depending on how cheap creating rewriters is, it might make sense to use some sort of rewriter pool, either thread-locally or shared. This seems relevant to sync rust as well as async.
async use specifically
Ideally the rewriters would also be Send but I imagine that introduces unacceptable performance costs to the other users of this crate. The motivation behind the Send bound is that use with most of the async ecosystem requires being able to move a Future that contains the rewriter between work-sharing executor threads. Most async runtimes (tokio, smol, async-std) also offer local (non-work-sharing) executors, which is how lol-async works at all, but the Send bound is prevalent in async contexts, and spawn_local is unstable on async-std.
Ah, so for rewriters you actually only need them to be Send, not Sync?