dhall-haskell
dhall-haskell copied to clipboard
Type-checking dhall-kubernetes-charts with an empty cache takes 25 minutes
- Checkout https://github.com/Gabriel439/dhall-kubernetes-charts/commit/33eb92e4d55224bb103c19b1fb4e56e16060b3d1
$ rm -rf ~/.cache/dhall*-
$ time dhall type --file stable/jenkins/index.dhall --quiet 75.84user 6.99system 24:59.22elapsed 5%CPU (0avgtext+0avgdata 235372maxresident)k 30712inputs+48368outputs (131major+82643minor)pagefaults 0swaps
Note the huge difference between user time and elapsed time: more than 23 minutes! I'll try to figure out what the process is waiting for there.
Here are some syscall statistics:
$ strace -c -w dhall type --file stable/jenkins/index.dhall --quiet
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.44 1277.662792 3936 324641 93352 select
0.40 5.304978 5 1008713 read
0.37 4.909465 5 1015626 fstat
0.37 4.797701 895 5358 498 poll
0.32 4.180177 8 546307 21477 stat
0.27 3.550268 7 512881 open
0.21 2.703619 5 517960 close
0.18 2.401575 5 514497 511256 ioctl
0.14 1.896009 6 343727 rt_sigprocmask
0.11 1.464082 14 106277 94401 rt_sigreturn
0.07 0.933065 5 171865 clock_gettime
0.04 0.529798 81 6532 getdents
0.02 0.303407 10 30999 4950 recvfrom
0.01 0.137467 12 11340 sendto
0.01 0.121909 9 13734 15 access
0.01 0.072200 33 2218 timer_settime
0.00 0.043991 22 2033 write
0.00 0.043851 9 5079 socket
0.00 0.043813 13 3465 1638 connect
0.00 0.040473 25 1620 getsockopt
0.00 0.038667 24 1620 sendmmsg
0.00 0.038470 19 2033 ftruncate
0.00 0.033649 7 4860 recvmsg
0.00 0.013780 20 685 madvise
0.00 0.010869 7 1620 bind
0.00 0.009235 5 1824 getsockname
0.00 0.005717 18 321 mmap
0.00 0.000584 6 100 setsockopt
0.00 0.000210 8 28 mprotect
0.00 0.000106 106 1 execve
0.00 0.000087 4 20 lseek
0.00 0.000085 5 18 getcwd
0.00 0.000077 15 5 brk
0.00 0.000061 4 14 rt_sigaction
0.00 0.000059 15 4 munmap
0.00 0.000059 30 2 mkdir
0.00 0.000013 7 2 chmod
0.00 0.000011 6 2 futex
0.00 0.000009 9 1 timer_delete
0.00 0.000008 4 2 getrlimit
0.00 0.000006 6 1 sysinfo
0.00 0.000006 6 1 timer_create
0.00 0.000005 5 1 set_tid_address
0.00 0.000004 4 1 arch_prctl
0.00 0.000004 4 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 1311.292421 5158039 727587 total
That the vast majority of the time is spent in select seems to indicate that the program is waiting for (network) input.
I'd like to figure out how the time spent waiting for requests to complete is distributed, and which requests take the longest time. Are there good tools to help me with that?
@sjakobi: You can set the https_proxy environment variable to forward all Dhall traffic through a locally running forward proxy that logs network requests
You can set the
https_proxyenvironment variable to forward all Dhall traffic through a locally running forward proxy that logs network requests
Cheers, I can try that! Any suggestions for a forward proxy for Linux?
@sjakobi: The two most common ones that I know of are squid and varnish but it's also not that hard to write a custom one using Haskell (e.g. using the http-proxy package or something similar)
I ended up installing mitmproxy, and using its browser GUI mitmweb.
I then ran dhall with
$ env https_proxy=127.0.0.1:8080 SYSTEM_CERTIFICATE_PATH=/home/simon/.mitmproxy dhall type --file stable/jenkins/index.dhall --quiet
I noticed
-
Requests to https://prelude.dhall-lang.org all take 800ms to 1s, which I find pretty slow – but since there are only ~120 files, we're done after < 2 minutes.
-
Requests for https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/* mostly take 250 - 400 ms, but every 5-20 requests there's one that takes 1s and sometimes 3s. This makes me wonder whether there's some kind of throttling at play.
-
In a last run with the
mitmdumpCLI tool, the process ended like this:
$ env https_proxy=127.0.0.1:8080 SYSTEM_CERTIFICATE_PATH=/home/simon/.mitmproxy dhall type --file stable/jenkins/index.dhall --quiet
dhall:
Error: The remote host took too long to respond
URL: https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.NamespaceStatus.dhall
Tail of mitmdump output
127.0.0.1:34334: GET https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.LocalVolumeSource.dhall
<< 200 OK 55b
127.0.0.1:34336: clientconnect
127.0.0.1:34168: clientdisconnect
127.0.0.1:34336: GET https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.NFSVolumeSource.dhall
<< 200 OK 68b
127.0.0.1:34170: clientdisconnect
127.0.0.1:34338: clientconnect
127.0.0.1:34172: clientdisconnect
127.0.0.1:34174: clientdisconnect
127.0.0.1:34176: clientdisconnect
127.0.0.1:34178: clientdisconnect
127.0.0.1:34180: clientdisconnect
127.0.0.1:34182: clientdisconnect
127.0.0.1:34184: clientdisconnect
127.0.0.1:34186: clientdisconnect
127.0.0.1:34188: clientdisconnect
127.0.0.1:34190: clientdisconnect
127.0.0.1:34192: clientdisconnect
127.0.0.1:34194: clientdisconnect
127.0.0.1:34196: clientdisconnect
127.0.0.1:34198: clientdisconnect
127.0.0.1:34200: clientdisconnect
127.0.0.1:34202: clientdisconnect
127.0.0.1:34204: clientdisconnect
127.0.0.1:34206: clientdisconnect
127.0.0.1:34208: clientdisconnect
127.0.0.1:34210: clientdisconnect
127.0.0.1:34212: clientdisconnect
127.0.0.1:34214: clientdisconnect
127.0.0.1:34216: clientdisconnect
127.0.0.1:34218: clientdisconnect
127.0.0.1:34220: clientdisconnect
127.0.0.1:34222: clientdisconnect
127.0.0.1:34224: clientdisconnect
127.0.0.1:34226: clientdisconnect
127.0.0.1:34228: clientdisconnect
127.0.0.1:34230: clientdisconnect
127.0.0.1:34232: clientdisconnect
127.0.0.1:34234: clientdisconnect
127.0.0.1:34236: clientdisconnect
127.0.0.1:34238: clientdisconnect
127.0.0.1:34240: clientdisconnect
127.0.0.1:34242: clientdisconnect
127.0.0.1:34338: GET https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.Namespace.dhall
<< 200 OK 153b
127.0.0.1:34340: clientconnect
127.0.0.1:34244: clientdisconnect
127.0.0.1:34246: clientdisconnect
127.0.0.1:34248: clientdisconnect
127.0.0.1:34250: clientdisconnect
127.0.0.1:34340: GET https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.NamespaceList.dhall
<< 200 OK 138b
127.0.0.1:34342: clientconnect
127.0.0.1:34252: clientdisconnect
127.0.0.1:34254: clientdisconnect
127.0.0.1:34256: clientdisconnect
127.0.0.1:34258: clientdisconnect
127.0.0.1:34342: GET https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.NamespaceSpec.dhall
<< 200 OK 47b
127.0.0.1:34344: clientconnect
127.0.0.1:34260: clientdisconnect
127.0.0.1:34262: clientdisconnect
127.0.0.1:34264: clientdisconnect
127.0.0.1:34266: clientdisconnect
127.0.0.1:34268: clientdisconnect
127.0.0.1:34270: clientdisconnect
127.0.0.1:34272: clientdisconnect
127.0.0.1:34274: clientdisconnect
127.0.0.1:34276: clientdisconnect
127.0.0.1:34278: clientdisconnect
127.0.0.1:34280: clientdisconnect
127.0.0.1:34282: clientdisconnect
127.0.0.1:34284: clientdisconnect
127.0.0.1:34286: clientdisconnect
127.0.0.1:34288: clientdisconnect
127.0.0.1:34290: clientdisconnect
127.0.0.1:34292: clientdisconnect
127.0.0.1:34294: clientdisconnect
127.0.0.1:34296: clientdisconnect
127.0.0.1:34298: clientdisconnect
127.0.0.1:34300: clientdisconnect
127.0.0.1:34302: clientdisconnect
127.0.0.1:34304: clientdisconnect
127.0.0.1:34306: clientdisconnect
127.0.0.1:34308: clientdisconnect
127.0.0.1:34310: clientdisconnect
127.0.0.1:34312: clientdisconnect
127.0.0.1:34314: clientdisconnect
127.0.0.1:34316: clientdisconnect
127.0.0.1:34318: clientdisconnect
127.0.0.1:34320: clientdisconnect
127.0.0.1:34322: clientdisconnect
127.0.0.1:34324: clientdisconnect
127.0.0.1:34326: clientdisconnect
127.0.0.1:34328: clientdisconnect
127.0.0.1:34330: clientdisconnect
127.0.0.1:34332: clientdisconnect
127.0.0.1:34334: clientdisconnect
127.0.0.1:34336: clientdisconnect
127.0.0.1:34338: clientdisconnect
127.0.0.1:34342: clientdisconnect
127.0.0.1:34340: clientdisconnect
127.0.0.1:34344: GET https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/4ef46543ceff0e10978cca542952c56ee6e47b93/types/io.k8s.api.core.v1.NamespaceStatus.dhall
<< 200 OK 46b
127.0.0.1:34344: CONNECT raw.githubusercontent.com:443
<< Error in HTTP connection: TcpDisconnect("(32, 'EPIPE')",)
127.0.0.1:34344: clientdisconnect
Note the large number of clientdisconnects at the end – I wonder whether that's some kind of resource leak that causes the server to block me.
While looking at the strace output I also noticed that we seem to re-read the SSL certificates on each request – it doesn't take long in total, but it might be worth optimizing anyways.
we seem to re-read the SSL certificates on each request
That's probably because we construct a new Manager for every request:
https://github.com/dhall-lang/dhall-haskell/blob/e931451a2b2892fa373c9dfc94b31bcd764131be/dhall/ghc-src/Dhall/Import/HTTP.hs#L261-L263
I seem to remember that we didn't always do that, and I wonder whether it impairs our ability to handle HTTP connections…
@sjakobi: Yeah, we should only be creating a Manager once. If we're doing so for every request that's a bug.
Regarding the prelude.dhall-lang.org slowdown, it's because prelude.dhall-lang.org makes a second request to GitHub to serve the file because it's not caching requests. You can find the nginx configuration we currently use here:
https://github.com/dhall-lang/dhall-lang/blob/be63f373f6989dbb922822343550314748358328/nixops/logical.nix#L170-L199
I can enable simple HTTP caching for prelude.dhall-lang.org for all requests for individual version-pinned Prelude files.
After #1522:
$ env XDG_CACHE_HOME=.cache time dhall type --file stable/jenkins/index.dhall --quiet
4.15user 1.85system 4:53.92elapsed 2%CPU (0avgtext+0avgdata 233076maxresident)k
176inputs+48368outputs (0major+53820minor)pagefaults 0swaps
5x faster for me. Thanks @Gabriel439! :)
I'll try to figure out what else we could do to speed that up.
@sjakobi: I still need to enable server-side caching for the Prelude, although I'm not sure how much of an effect that will have
So I took a stab at nginx server-side caching and was not successful, so I'm going to put it down for now unless we have evidence that this would significantly improve cold startup speeds. If somebody else wants to try to pick this up I pushed a branch with the nginx configuration that I tried:
https://github.com/dhall-lang/dhall-lang/compare/gabriel/server_side_caching
Are the network requests parallelized? Every nth one taking 1–3s shouldn’t matter if we do 10–50 at once. Maybe there’s a way to take advantage of HTTP/2 here?
@Profpatsch: They are currently not parallelized, but I believe they can be parallelized with some work
I forgot to mention here that one idea I have to address this is to create a proxy.dhall-lang.org that interpreters can optionally use to as a forward proxy to download pre-resolved (and possibly pre-normalized) Dhall expressions. See: https://discourse.dhall-lang.org/t/rfc-proxy-dhall-lang-org/144
Hi there! I'm a long time lurker but inexperienced user 👋 Just popping in to leave my 2 cents of feedback.
I was trying to evaluate a Dhall snippet from Gabriel's slides, on a fresh install of the latest dhall-haskell (1.37.1) (so nothing in cache), and it made for a confusing new user experience.
Downloading https://prelude.dhall-lang.org/v20.0.0/package.dhall and all its dependencies took several minutes, and dhall provided no feedback on what it was doing, so I was left staring at a blank screen.
The lack of feedback was really confusing. Since dhall with no command-line options behaves kind of like cat, I thought it was waiting for more input, or a ^D or something. After a while I started troubleshooting my Dhall installation ("Wrong version? Maybe I'll try the static binaries instead of using what Nixpkgs gives me?"). Thankfully I eventually let it sit long enough to see it finally evaluate and realize my mistake.
I think it would be really helpful to have some kind of additional feedback about what dhall is doing in cases like this. A progress bar would be cool, but even just a single log line saying it's downloading files would've cleared up my confusion. I also tried to reach for a --verbose flag while troubleshooting, but the dhall executable doesn't provide one. That might be a good incremental step.
EDIT: Probably should've left this comment on https://github.com/dhall-lang/dhall-haskell/issues/2062, but oh well.
@evanrelf: The main reason I haven't done this yet is because I'd like to first fix the original problem (the imports being slow) as much as possible before adding workarounds like logging or progress bars (although I acknowledge at some point we will need them).
One of the things I've been doing along these lines is setting up a global cache of binary-encoded and fully-resolved downloads for common Dhall packages (especially the Prelude) that Dhall interpreters can use to efficiently retrieve packages in a single HTTP request.
You can track the progress of that work here: https://discourse.dhall-lang.org/t/rfc-proxy-dhall-lang-org/144/10