learn-ocaml
learn-ocaml copied to clipboard
learn-ocaml startup algorithm does not seem to scale
Hello,
(I recently noticed this issue, but I did not investigate further)
Currently (after a whole semester), when the learn-ocaml Docker container is restarted or recreated, the startup takes a very long time (around 2'30) and during that time, the server is offline, which is quite annoying.
cf. the following log:
2020-03-29T12:15:19.994597000Z Learnocaml v.0.12 running.
2020-03-29T12:15:19.994991000Z Updating app at ./www
2020-03-29T12:15:20.250270000Z test (no changes)
2020-03-29T12:15:20.250972000Z exo-dm1 (no changes)
2020-03-29T12:15:20.251488000Z exo-dm2 (no changes)
2020-03-29T12:15:20.251986000Z exo-theme10 (no changes)
2020-03-29T12:15:20.252444000Z exo-theme5 (no changes)
2020-03-29T12:15:20.252882000Z exo-theme6 (no changes)
2020-03-29T12:15:20.253311000Z exo-theme7 (no changes)
2020-03-29T12:15:20.253749000Z exo-theme8 (no changes)
2020-03-29T12:15:20.254189000Z exo-theme9 (no changes)
2020-03-29T12:15:20.254620000Z exo-tp1 (no changes)
2020-03-29T12:15:20.255119000Z exo-tp2 (no changes)
2020-03-29T12:15:20.255571000Z exo-tp2bis (no changes)
2020-03-29T12:15:20.267962000Z Learnocaml server v.0.12 starting on port 8080
2020-03-29T12:17:53.535157000Z Found the following teacher tokens:
2020-03-29T12:17:53.535884000Z - REDACTED
[…]
I'm not sure but I guess the bottleneck is the following line:
https://github.com/ocaml-sf/learn-ocaml/blob/145fabcb4800136ba0b7ffbf42163a5153a098ca/src/server/learnocaml_server.ml#L512
@yurug did you experience a similar issue? anyway, maybe we could consider slightly changing the architecture of learn-ocaml's persistent data (e.g. by caching/storing the teachers token in another place?) so that that step is immediate…
@Aleridia volunteers to take a look at this issue (which will be a first step towards implementing the other User Stories Maxime is working on).
At first sight, the implementation will require some form of database (to keep an up-to-date index independently of the /sync
subfolders themselves). This database could be stored in a standard database format (handling concurrency?) or maybe just a mere versioned json
format… opinions on these choices?
Cc @yurug @AltGr
As a matter of fact, I do not experience this issue. How many students do you have?
I have 241 students on mine.
However, the machine is equipped with an efficient SSD so I guess that could explain the difference in terms of efficiency.
The definition of Tokens.Index.get ()
indeed triggers a lot of system calls. However, 2 minutes seem an eternity for a simple file hierarchy traversal. How much time does it take to run find -type d
from the root of your sync directory?
Same question with find . -type d -maxdepth 4
.
Hi @yurug, thanks for your comments.
As a matter of fact, I do not experience this issue. How many students do you have?
~160 students
The definition of
Tokens.Index.get ()
indeed triggers a lot of system calls. However, 2 minutes seem an eternity for a simple file hierarchy traversal. How much time does it take to runfind -type d
from the root of your sync directory? Same question withfind . -type d -maxdepth 4
.
Unfortunately I cannot check this for the moment as the host that serves our learn-ocaml instance has temporarily shutdown the server (due to some "casual administrative issue" with the Univ.), but I hope to be able to access the files again next week. I'll let you know at that time.
Also (but this is only my intuition, not an experiment) I guess the issue is related to the fact each student folder contains git repositories that grow significantly (a priori several new files are created per commit) and this may be amplified by https://github.com/ocaml-sf/learn-ocaml/pull/338 as learn-ocaml now creates empty commits for every student click on the Sync button.
Anyway, are you in favor of the solution outlined in https://github.com/ocaml/opam/issues/4203#issuecomment-631510157 ? namely:
- keep the
/sync
structure as is; - add an index database (using e.g.
irmin
) that gathers all tokens (plus maybe other infos when we'll need to add authentication data (preferably not stored inside the individual student git repo…)) - and bump the ocaml-version constraint to 4.07.1 if it happens to be required to install irmin 2.x ?
(we can also plan some phone call next week if you want to discuss this in more depth)