Tidal icon indicating copy to clipboard operation
Tidal copied to clipboard

Text instead of String throughout Tidal

Open dktr0 opened this issue 3 years ago • 10 comments

If we could use Text instead of String wherever possible, throughout the Tidal codebase, that would translate into performance advantages when Tidal code is run in the browser (ie. in Estuary). Happy to help do this re-factor if there's support for it! (Having done it a while ago in the evolution of the Estuary codebase already...)

dktr0 avatar Nov 26 '21 00:11 dktr0

Yes sounds good. Have a look at the existing benchmarking suite before / after to see what difference it makes.

yaxu avatar Nov 26 '21 06:11 yaxu

It would be good to do some profiling too. This looks useful: https://hackage.haskell.org/package/profiterole

When you say help refactor, do you mean do the whole job? :) That would be great if you already have a handle on what is involved. With overloaded strings already on for everyone I expect there won't be any changes to the ui. I'm interested in optimisation at the moment as I'm working on tidal on the low powered pi zero. Are there particular concerns for estuary beyond a general need for optimisation due to being in a javascript vm?

yaxu avatar Nov 26 '21 10:11 yaxu

Yeah, I can take a look at "the whole thing". It might not be such a big job - the thorniest parts will probably be those connected to parsing (eg. mini-notation), where the gains/benefits are also somewhat smaller (lists being not terrible when you are looking ahead a few characters at a time). The String vs. Text thing is low-hanging fruit in terms of optimizing for Estuary - Strings in GHCJS (like in GHC) are linked lists (one node per character...) so even something like "Tidal" becomes a whole nested structure of objects within objects, while Text is mapped directly onto an underlying JS string.

dktr0 avatar Nov 26 '21 17:11 dktr0

As to other concerns... not really. I think the performance-challenged environment of the browser definitely exposes performance costs that might seem negligible otherwise. I have the impression one general source of slowdown is situations where "variants" of underlying patterns are calculated and then combined (eg. in a way which discards much of the information from one or other of the variants/paths), which adds up to exponential costs when there's several layers worth of such operations. I'm not sure how much possibility for optimization there is there though... might just be the unavoidable cost of such things!

dktr0 avatar Nov 26 '21 17:11 dktr0

Starting another project with PureScript... I am curious how much of a difference that will make to the performance of otherwise comparable code. Nothing to report yet though...

dktr0 avatar Nov 26 '21 17:11 dktr0

Yes. Put on the iron Shirt, and chase Strings out of Earth.

I was looking into Haskell profiling (and visualising the profile) recently. https://hackage.haskell.org/package/ghc-prof-flamegraph makes nice pictures.

What is the use case that we want to profile (i.e., where we suspect that Text vs String has an impact)? Can we add this to the performance benchmark?

I was doing this experiment:

stack bench tidal:bench:bench-speed --resolver=lts --profile
ghc-flame-graph bench-speed.prof  # produces .svg
profiteur bench-speed.prof  # produces .html

and the flame graph is ... ah, github does not want to inline SVG here, so I'm dropping the files at https://www.imn.htwk-leipzig.de/~waldmann/etc/Tidal/ (using several options for generating the graph).

[EDIT] I realize only now that you wrote profiterole while I thought profiteur, having read about it in this thread (two answers) https://mail.haskell.org/pipermail/haskell-cafe/2021-November/134902.html So I applied profiterole as well, and copied the files.

jwaldmann avatar Dec 01 '21 22:12 jwaldmann

For me, the use case I mostly care about is Estuary, which has not hitherto been so easy to profile for... In any case, one use case there where String to Text will probably make a difference is at the transition from Estuary's render engine (which, for Tidal, samples ControlPattern-s to produce events) to WebDirt. If everyone of those generated maps has Strings in it versus Text I think there's a fair bit of memory churn for the browser there. With the caveat that performance will be different in GHCJS in the browser, one could profile the case of sampling ControlPatterns and converting sample names to Text, both in the case where s patterns are fundamentally Pattern String (which needs to be pack-ed) and the case where s patterns are fundamentally Pattern Text (no need to pack). This is a cost that is paid repeatedly again and again in each moment that patterns are being played (unlike parsing which is only paid "once" per evaluation).

dktr0 avatar Dec 01 '21 23:12 dktr0

Yes beautiful images! I didn't look closely, but it doesn't seem to tell us very much, perhaps because we need more benchmark cases? Or perhaps we need to 'instrument' the code more somehow to give the benchmarking more information.. It would be nice to benchmark based on a 'real world' use of Tidal. Feedforward can do keystroke playback, but I suppose we just need code evaluations + time. By the way I tried to run the benchmarks on my laptop via cabal, and at some point it just froze and I had to reboot. I'm not sure if it was memory or cpu that was locked. I'll look again later.

yaxu avatar Dec 02 '21 09:12 yaxu

it doesn't seem to tell us very much, perhaps because we need more benchmark cases?

Yes that's my impression too. At the top of the profile, there's compressArc but it's not necessarily inefficient - it might just be called very often, by design of the benchmark.

Or perhaps we need to 'instrument' the code more somehow to give the benchmarking more information..

automatic instrumentation happens with ghc -prof -auto-all (or similar) https://downloads.haskell.org/ghc/latest/docs/html/users_guide/profiling.html?highlight=profiling#ghc-flag--fprof-auto I am not sure whether that's the exact option that was produced when stack called ghc, building the executable that produced the above graphs. I will re-check.

I tried to run the benchmarks on my laptop via cabal, and at some point it just froze

Same here. They eat a lot of memory, and perhaps profiling increases the footprint. I can run them on a machine with 32 GB, but not less. "just froze" - that's the dreaded "OOM killer" (someone should really use this as the name for their band) when the system runs out-of-memory, it will kill some process https://wiki.archlinux.org/title/Improving_performance#RAM,_swap_and_OOM_handling

A mitigation for that is to add +RTS -M15G -RTS options when calling the benchmark executable. If the benchmark eats too much RAM, it will be killed by GHC's RTS, before the OS knows about it. I'm not sure how to pass that via stack.

These are detailed commands to build and run benchmarks with explicit options

stack --resolver=lts build --bench --no-run-benchmarks --profile --ghc-options '-fprof-auto -rtsopts -O2'
.stack-work/dist/x86_64-linux/Cabal-3.2.1.0/build/bench-speed/bench-speed +RTS -P -H -M32G -RTS

the benchmark executable accepts some options (to put after -RTS), e.g., -m glob for selecting a subset of benchmarks but I can't get this working (cf. https://github.com/haskell/criterion/issues/101#issuecomment-264746373)

Back to the original topic (Text vs. String) - meant to be run "in the browser": here it says that stack can be configured to use ghcjs https://docs.haskellstack.org/en/v1.4.0/ghcjs/ , so the same command for running benchmarks (on node) should work. Did anyone try? I gave up on installing ghcjs, cf. https://github.com/ghcjs/ghcjs/issues/819

jwaldmann avatar Dec 02 '21 10:12 jwaldmann

I'm not sure if this will make a huge difference, as there isn't much concatenation of strings in Tidal.. But I'm engaged in a fairly broad rewrite of tidal at the moment, so this could be a good moment to switch to Text.

yaxu avatar Sep 29 '22 15:09 yaxu