Franklin.jl icon indicating copy to clipboard operation
Franklin.jl copied to clipboard

Lengthy first pass from scratch if the code blocks are computationally intensive

Open rikhuijzer opened this issue 3 years ago • 8 comments

Hi! :smile: While using Franklin with a more computationally heavy site, I noticed that the server isn't available until all the computations are done (20 minutes on a fresh git clone, in my case). I think that it could quite possibly improve the user experience a lot, for any site with computations, if the computations are executed asynchronously. Basically, the aim would be to have LiveServer.jl running instantly at any moment so that the site is visible to the local user. Even more awesome would be to process each page asynchronously because they seem to be independent as far as I can tell, but this will probably complicate things a lot.

This might be related to earlier discussions or work, such as #124 or #138.

rikhuijzer avatar Jan 11 '21 09:01 rikhuijzer

Hmm, so I see what you're saying but note that if you don't clear __site then only the pages that have changed will be re-evaluated (even on first pass). So yes there will be one big heavy pass to do at one point (e.g. if you're cloning from fresh) but after that it will be each page at a time.

Now what we would want to ideally do is to have pages display instantaneously and show a grey box or whatever for computations that have not completed yet. This is not trivial to do to say the least, here are a few things to bear in mind:

  • the whole thing needs to be threaded (async is not the right approach here afaik)
  • there needs to be a queue of "things to do", e.g. if you start the server, then modify a page while all computations have not finished

both are pretty hard and I don't think it's something I'd want to do in the near future

Possible alternative

Maybe a more feasible first step would be to allow the first pass to discard computations (like an eval_nothing mode) and use placeholder for things that have not been evaluated; then evaluate page by page as they get modified as per usual. The advantage for this would be that you could have additional modes like:

  1. eval nothing on first pass, eval after that
  2. eval nothing during the whole session
  3. eval everything (default)

the second could be useful if you just want to modify text for instance.

wdyt?

tlienart avatar Jan 11 '21 10:01 tlienart

Thanks for your clarifications.

async is not the right approach here afaik

I think that many Julia instances still run with only one thread. That's why I suggested asynchronous "tasks", or coroutines. Those can be quite fast, I've seen a blogpost somewhere where a server managed something like 100k HTTP connections per thread.

This is not trivial to do to say the least, here are a few things to bear in mind:

I feared so :\.

both are pretty hard and I don't think it's something I'd want to do in the near future

That's fair :+1: Thanks for being clear about it.

note that if you don't clear __site then only the pages that have changed will be re-evaluated (even on first pass).

I agree and indeed I try to avoid clearing __site. However, sometimes Franklin gets into an inconsistent state and I need to clean it. I had that with Turing, where the chains object was written in one Literate block and read in another. Sometimes, the chains object seemed to exist but contained nothing. Let me know if I should try to find a minimal working example of this. Even reeval = true couldn't solve this, by the way.

placeholder for things that have not been evaluated; wdyt?

Wouldn't this be quite involved because you have \output, \tableinput etc. Maybe just add a big banner at the top like "This page has not been fully evaluated, some output may be missing."

I'm going to try to solve this by calling the extensive calculations more from the REPL. Basically, I will call MySite.main() for a full pass or functions inside MySite manually to update the site. Franklin can then pick-up the output.

You can close this issue if you want :+1:

rikhuijzer avatar Jan 11 '21 10:01 rikhuijzer

However, sometimes Franklin gets into an inconsistent state and I need to clean it. I had that with Turing, where the chains object was written in one Literate block and read in another. Sometimes, the chains object seemed to exist but contained nothing. Let me know if I should try to find a minimal working example of this. Even reeval = true couldn't solve this, by the way.

Could you give me more info on this? You should never have to clear and reeval should be all you need to do if you want to say "re-evaluate this page completely please"; if that's not the case then that's a bug and it should be fixed.

Wouldn't this be quite involved because you have \output, \tableinput etc. Maybe just add a big banner at the top like "This page has not been fully evaluated, some output may be missing."

Well it's less involved than async or threads ;-) yes the banner has also been suggested in the context of the engine being more fault-tolerant (eg #715, #405)

I'll leave this open for now to think about the "eval_nothing" mode, it might work.

PS: what do you mean with REPL mode? this would be identical in your case right? I mean it would also take 20 minutes to run your full workload?

tlienart avatar Jan 11 '21 10:01 tlienart

Could you give me more info on this? You should never have to clear and reeval should be all you need to do if you want to say "re-evaluate this page completely please"; if that's not the case then that's a bug and it should be fixed.

I'll soon work on the TuringModels.jl project again and try to find the exact cause. Could it be related to the use of Literate?

I'll leave this open for now to think about the "eval_nothing" mode, it might work.

I thought that serve was unresponsive when evaluating code, but that just works :rocket: Then, I think that "eval_nothing" might work too :+1:

PS: what do you mean with REPL mode? this would be identical in your case right? I mean it would also take 20 minutes to run your full workload? this would be identical in your case right? I mean it would also take 20 minutes to run your full workload?

I meant to avoid running extensive codeblocks in markdown pages and only use \tableinput and \fig. Then, I'll put the tables and figs in the right places by calling functions from the REPL manually. You're right. It would allow me to browse the site while doing the first lengthy computation and make the evaluation a bit more dynamic with Revise.jl, but in essence it's almost the same.

rikhuijzer avatar Jan 11 '21 11:01 rikhuijzer

Could it be related to the use of Literate?

no.

Basically if you mess up a cell, do reeval=true and it doesn't look like it's properly re-evaluating the whole page you're working on, then it's a bug and I need to dig into it (if you just share the script that seemed to have caused the issue, that should be enough)

tlienart avatar Jan 11 '21 11:01 tlienart

Basically if you mess up a cell, do reeval=true and it doesn't look like it's properly re-evaluating the whole page you're working on, then it's a bug and I need to dig into it (if you just share the script that seemed to have caused the issue, that should be enough)

Sorry, but I can't remember which script was one of the problematic scripts exactly. I'll send it once I know.

rikhuijzer avatar Jan 11 '21 13:01 rikhuijzer

cool thanks; also I'll re-start working on DataScienceTutorials.jl which is a similar setup as what you described (lots of literate files, heavy computational load) so that should also give me the occasion to work on tools that improve on the status quo

tlienart avatar Jan 11 '21 14:01 tlienart

Basically if you mess up a cell, do reeval=true and it doesn't look like it's properly re-evaluating the whole page you're working on, then it's a bug and I need to dig into it (if you just share the script that seemed to have caused the issue, that should be enough)

Sorry, but I can't remember which script was one of the problematic scripts exactly. I'll send it once I know.

Thanks to David Widmann, I found the reason for the reeval not working out (https://github.com/TuringLang/Turing.jl/issues/1529). It was caused by trying to overwrite constants.

julia> chains(x) = x
chains (generic function with 1 method)

julia> chains = sample(model(a, b), NUTS(0.65), 1000)
ERROR: invalid redefinition of constant chains

rikhuijzer avatar Jan 29 '21 17:01 rikhuijzer