plot
plot copied to clipboard
Large overhead for drawing plots with lots of data points
A private email from a person who noticed poor performance of the plot package when plotting millions of data points, prompted me to run some tests to see what the overhead of the actual plot
package is on top of racket/draw
(which is used for all the drawing). The result is that it is a lot.
There are two programs below (the second one has two variants):
- one is drawing a plot using
lines
- the second one ,
do-manual-plot
will draw the plot using thedraw-lines
dc<%>
method, and do all plotting calculations itself - the third one,
do-manual-plot/individual-lines
will draw the plot using onedraw-line
call for each line segment, also doing all plotting calculations itself.
Here is the performance of the programs
data points | plot | draw-lines | individual draw-line |
---|---|---|---|
100 | 31ms | 15 ms | 15ms |
1'000 | 78ms | 15ms | 15ms |
10'000 | 920ms | 93ms | 156ms |
100'000 | 81'515ms | 3'046ms | 1'484ms |
1'000'000 | too long | 45'734ms | 16'634ms |
For large number of points, the time of the plot
version becomes too large very quickly, much more than the time increase for draw-lines
and draw-line
versions. Note that plot
ultimately uses the same draw package, so the plot package overhead is probably the plot time minus the draw-lines time.
I also found that, for large number of points, a single draw-lines
call is much slower than multiple draw-line
calls, one for each line -- this is somewhat counterintuitive, but I could not find an explanation for this...
#lang typed/racket
(require plot)
(: points (Listof (List Real Real)))
(define points (build-list 100000 (lambda ([x : Index]) (list x (random)))))
(time
(parameterize ([plot-decorations? #f])
(plot-bitmap (lines points #:color 6))))
#lang racket
(require racket/draw
plot ;; just for plot-width, plot-height
)
(define (transform-points p)
(define-values (min-x min-y max-x max-y)
(for/fold ([min-x (first (first points))]
[min-y (second (first points))]
[max-x (first (first points))]
[max-y (second (first points))])
([point (in-list (rest points))])
(define x (first point))
(define y (second point))
(values (min min-x x) (min min-y y) (max max-x x) (max max-y y))))
(define dx (- max-x min-x))
(define dy (- max-y min-y))
(define w (plot-width))
(define h (plot-height))
(define (transform point)
(define x (first point))
(define y (second point))
(cons (* w (/ (- x min-x) dx))
(* h (- dy (/ (- y min-y) dy)))))
(map transform points))
(define (do-manual-plot points)
(define bm (make-object bitmap% (plot-width) (plot-height)))
(define dc (new bitmap-dc% [bitmap bm]))
(send dc set-pen (send the-pen-list find-or-create-pen "purple" 1 'solid))
(define plot-points (transform-points points))
(send dc draw-lines plot-points)
bm)
(define (do-manual-plot/individual-lines points)
(define bm (make-object bitmap% (plot-width) (plot-height)))
(define dc (new bitmap-dc% [bitmap bm]))
(send dc set-pen (send the-pen-list find-or-create-pen "purple" 1 'solid))
(define plot-points (transform-points points))
(for ([p1 (in-list plot-points)]
[p2 (in-list (rest plot-points))])
(send dc draw-line (car p1) (cdr p1) (car p2) (cdr p2)))
bm)
(define points (build-list 100000 (lambda (x) (list x (random)))))
(printf "individual draw-line calls~%")
(time (do-manual-plot/individual-lines points))
(printf "single draw-lines call~%")
(time (do-manual-plot points))
I started looking at this because I was worried it was Typed Racket-induced contract overhead, but that doesn't seem to be the case. One thing I did see is that plot
seems to do a bunch of other O(N) operations (for example, I think there are O(N) calls to styled-segment
in draw-lines*
in plot/private/common/draw.rkt).
I think you are correct. This problem might indeed be limited just to the lines renderer itself. If using the points
renderer, plot
takes roughly the same amount of time as using racket/draw
directly. Using this program as a test:
#lang typed/racket
(require plot)
(: p (Listof (List Real Real)))
(define p (build-list 1000000 (lambda ([x : Index]) (list x (random)))))
(time
(parameterize ([plot-decorations? #f])
(plot-bitmap (points p #:color 6))))
Free association, based on history, probably worthless.
When I read your table of performance numbers, I am reminded of an experiment we ran in the late 90s. For quite some time we though, an algorithm looked “practically linear” even though the algorithm was supposed to be in O(n^3). 10, 100, 1000 lines — it scaled nicely. Eventually we discovered that the constant on n^3 was small but not small enough to make it go away. The cubic element kicked in at over 3000 or 4000 lines (my memory isn’t a 100% here).
So you might have that or Sam’s O(n) suggestions might “multiply” at some point.
I spend some time investigating this issue and found the following:
-
styled-segment
anddraw-lines*
are not used in this case (as was originally suggested by @samth), since the line style is'solid
- Instead,
draw-lines/pen-style
directly calls thedraw-lines
method on the device context (see the case for'solid
pen style) https://github.com/racket/plot/blob/6743d6b82337296bbee484c1570bf9948e030ce6/plot-lib/plot/private/common/draw.rkt#L431 - the actual actual draw lines method on the DC takes most of the time on the call. I added some crude instrumenting using
current-inexact-milliseconds
and, for 100000 points, rendering the entire plot takes 25390 ms and the draw lines call on the DC takes 22285 ms (87% of the time). Note that the timings are different than originally reported, as I ran the new tests on a different computer. - the plot library spends a lot of time transforming the input list of points, constructing intermediate lists, this accounts for a large amount of the remaining 13% of the rendering time.
It looks to me that, for large number of points, calling draw-lines
on the dc<%>
interface using Typed Racket is slow -- @samth , is there a way to further investigate if this is indeed the case?
For reference, this is the instrumentation I did:
plot-lib/plot/private/common/draw.rkt | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/plot-lib/plot/private/common/draw.rkt b/plot-lib/plot/private/common/draw.rkt
index a072a5b..a13cf72 100644
--- a/plot-lib/plot/private/common/draw.rkt
+++ b/plot-lib/plot/private/common/draw.rkt
@@ -429,12 +429,19 @@
(: draw-lines/pen-style (-> (Instance DC<%>) (Listof (Pair Real Real)) Plot-Pen-Style-Sym Void))
(define (draw-lines/pen-style dc vs style-sym)
+ (printf "*** ~a style-sym: ~a~%" (current-inexact-milliseconds) style-sym)
(cond [(or (empty? vs) (eq? style-sym 'transparent)) (void)]
[else
- (let ([vs (map cons-fl vs)])
- (cond [(eq? style-sym 'solid) (send dc draw-lines vs)]
+ (let ([vs1 (map cons-fl vs)])
+ (printf "*** ~a cons-fl done~%" (current-inexact-milliseconds))
+ (cond [(eq? style-sym 'solid)
+ (define start (current-inexact-milliseconds))
+ (send dc draw-lines vs)
+ (define end (current-inexact-milliseconds))
+ (printf "*** ~a draw-lines done took ~a ms~%"
+ end (- end start))]
[else
(define pen (send dc get-pen))
(define scale (max 1.0 (fl (send pen get-width))))
(define sty (scale-pen-style (symbol->style style-sym) scale))
- (draw-lines* dc vs sty)]))]))
+ (draw-lines* dc vs1 sty)]))]))
And this is the program I ran:
#lang typed/racket
(require plot)
(: points (Listof (List Real Real)))
(define points (build-list 100000 (lambda ([x : Index]) (list x (random)))))
(time
(parameterize ([plot-decorations? #f])
(plot-bitmap (lines points #:color 6))))
I also converted to Typed Racket the program that uses draw-lines
directly (see issue description) and there is no performance problem with the draw-lines call -- this seems to be somehow related to the draw-lines call inside the plot package.
I believe that Typed Racket program has no contract boundaries that aren't there always in plot -- ie, I don't think there's anything Typed Racket about your example.
To see if Typed Racket is the problem, I recommend the contract profiler -- if it shows a non-trivial percentage of time in contracts, particularly contracts that look to be types, then that's likely a problem. I think when I tried this before for my earlier comment that didn't happen.
Hi @samth , I am confused about contract-profile
. At first, I removed the "compiled" folders from the plot package, to ensure that the plot package code is instrumented, and contract profile showed that no significant time is spent on the contracts. Next, I removed the compiled folders from the draw library as well. When I ran contract profile again, I got the following:
*** 1624329656455.485 style-sym: solid
*** 1624329656478.1055 cons-fl done
*** 1624329682390.243 draw-lines done took 25910.7099609375 ms
*** drawing took 26008.696533203125 ms
Running time is 95.49% contracts
26468/27719 ms
(->* ((or/c natural? (sequence/c (or/c natural? (sequence/c ... 26436 ms
(lib plot/private/plot2d/line.rkt):36:9
lines 26436 ms
(->*m ((or/c (listof (is-a?/c point%)) (listof (cons/c real? ... 31.5 ms
#<blame-no-swap>
draw-lines 31.5 ms
The lines starting with "***" are the printfs I added to the plot code. The contract profiler now shows that 95% of the time is spent in contracts, but the contract it flags is at the "entry" into the plot package (the "lines" renderer), which does not make much sense to me....
Looking at the implementation of sequence/c
contract, it seems like it could benefit from some special cases for lists and maybe other things. For example, with the diff below, the contract (sequence/c integer?)
, when applied to short lists of integers is about 4x faster.
$ git diff -b | cat
diff --git a/racket/collects/racket/sequence.rkt b/racket/collects/racket/sequence.rkt
index f07d4b9c72..eb234c45da 100644
--- a/racket/collects/racket/sequence.rkt
+++ b/racket/collects/racket/sequence.rkt
@@ -212,7 +212,12 @@
(cond
[(and (= n-cs 1) (not min-count))
(define p (car ps))
+ (define listof-proj ((contract-late-neg-projection (listof (car ctcs))) orig-blame))
(λ (seq neg-party)
+ (cond
+ [(list? seq)
+ (listof-proj seq neg-party)]
+ [else
(unless (sequence? seq)
(raise-blame-error
orig-blame #:missing-party neg-party seq
@@ -246,7 +251,7 @@
(cond
[(list? seq) (sequence->list result-seq)]
[(stream? seq) (sequence->stream result-seq)]
- [else result-seq]))]
+ [else result-seq])]))]
[else
(λ (seq neg-party)
(unless (sequence? seq)
My guess is that the contract cost is in the result contract, not shown in the print out, which is Renderer2d. That's an object which will impose cost on every method access.
One other thing, @alex-hhh, I think you meant to use vs1
in the call to draw-lines
.
I'm now much more confused. I tested with the following program:
#lang typed/racket/base
(require (only-in plot/private/common/parameters plot-decorations?)
(only-in plot/private/no-gui/plot2d plot-bitmap)
(only-in plot/private/plot2d/line lines))
(: points (Listof (List Real Real)))
(define points (build-list 100000 (lambda ([x : Index]) (list x (random)))))
(time
(parameterize ([plot-decorations? #f])
(plot-bitmap (lines points #:color 6))))
This should have no Typed Racket-related contract boundaries -- all of the listed plot modules as written in Typed Racket. It has the same slow performance that @alex-hhh reports.
However, when I put a printf
where @alex-hhh did, the dc
value is #(struct:wrapper-object:bitmap-dc% ...)
which indicates that it's a contracted object. Unfortunately it seems like value-contract
doesn't do the right thing for such values; it produces #f
on that value. Note that bitmap-dc%
is protected by its own class contract that is unrelated to Typed Racket, so that might be part of it.
Furthermore, raco contract-profile
shows effectively no time in contracts for that program for me (as I expected). You shouldn't see different behavior from the contract profiler with and without compiled files, though, since the instrumentation is done at runtime.
Removing the contract inside racket/draw doesn't remove the wrapper-object so I'm still confused.
One other thing, @alex-hhh, I think you meant to use vs1 in the call to draw-lines.
This is from one of my earlier attempts to determine the step that takes a long time -- the last transformation, (map cons-fl vs)
, is not technically needed, at least not in this test case. The rendering includes creating several copies of the list of points (with transformations), and I first thought that creating these intermediate lists is what take up the time, but, compared with the time it takes to execute the draw-lines
call, the other timings are negligible.
Further investigation:
I put some printouts inside class-internal
to see what the contracts were. They are very close to the types in gui-types.rkt
in TR, but not exactly the same. For example:
ctc: (#<chaperone-contract: (->m real? void?)> #<procedure:set-alpha method in dc%> (class bitmap-dc%) /home/samth/sw/plt/racket/share/pkgs/pict-lib/pict/private/pict.rkt set-alpha)
[set-alpha (Nonnegative-Real -> Void)] ;; nonnegative constraint not in the contract
The wrapper objects are created with the following blame objects:
created wrapper object: bitmap-dc% blm: /home/samth/sw/plt/racket/share/pkgs/pict-lib/pict/private/pict.rkt
created wrapper object: bitmap-dc% blm: /home/samth/sw/plt/extra-pkgs/plot/plot-lib/plot/private/no-gui/plot2d.rkt
However, I can't seem to find the relevant contracts.
I also tried simply removing the use of contract
in concretize-ictc-method
in class-internal.rkt
and just producing meth
, but that did not change the performance.
Conclusions so far:
- I'm pretty sure this is not Typed Racket related
- There's a contract somewhere that I haven't found yet
- It's not yet clear if contracts are the performance issue
Further investigation: the mysterious contract is applied as part of this line: https://github.com/racket/plot/blob/master/plot-lib/plot/private/no-gui/plot2d.rkt#L108
However, I don't know why there's a contract there. bitmap-dc%
on that line is imported from typed/racket/draw
, which directly imports racket/draw/private/bitmap-dc
which doesn't have any contracts. The class there inherits from various things that also don't have any contracts. So I'm very confused.
At @rfindler's suggestion, I investigated @alex-hhh's hypothesis that calling draw-lines
on a dc in Typed Racket is the slow thing; fortunately that doesn't seem to be the case.
Here's the timings I get, for slightly modified versions of Alex's program in the original report. They're almost identical between typed and untyped used of racket/draw
, and around 1 second in both cases. With Alex's (typed) program using plot, I get ~47 seconds. So there's definitely some large overhead in plot that I don't understand, but it isn't just that TR is slow when using draw-lines
.
[samth@homer:~/tmp plt] r -t draw-points-typed.rkt
'typed
individual draw-line calls
cpu time: 1110 real time: 1111 gc time: 27
(object:bitmap% ...)
single draw-lines call
cpu time: 937 real time: 937 gc time: 16
(object:bitmap% ...)
[samth@homer:~/tmp plt] r -t draw-points-typed.rkt
'typed
individual draw-line calls
cpu time: 1044 real time: 1046 gc time: 28
(object:bitmap% ...)
single draw-lines call
cpu time: 960 real time: 961 gc time: 17
(object:bitmap% ...)
[samth@homer:~/tmp plt] r -t draw-points.rkt
'untyped
individual draw-line calls
cpu time: 1154 real time: 1154 gc time: 29
(object:bitmap% ...)
single draw-lines call
cpu time: 954 real time: 954 gc time: 16
(object:bitmap% ...)
[samth@homer:~/tmp plt] r -t draw-points.rkt
'untyped
individual draw-line calls
cpu time: 1162 real time: 1162 gc time: 29
(object:bitmap% ...)
single draw-lines call
cpu time: 974 real time: 974 gc time: 16
(object:bitmap% ...)
The code I ran is at https://gist.github.com/samth/385182bcd0c1bc26795aa7e58a81f80b.
Hi @samth , I mentioned in a previous comment (https://github.com/racket/plot/issues/94#issuecomment-864719372) that using draw-lines from TR does not have a performance problem and I was also puzzled by that....
One additional note -- I tried using plot-pict
instead of plot-bitmap
, and that reduces the time to 1 second from 47 seconds, and more significantly, the draw-lines
call takes only 55 ms.
One more piece of (not useful yet) investigation -- this code, which tries to be more like what plot
actually does, is also fast (in Typed Racket).
(require plot/private/common/draw
(only-in plot/private/common/plotmetrics plot-metrics-mixin Plot-Metrics<%>))
(: do-like-plot : (Listof (List Integer Real)) -> Any)
(define (do-like-plot points)
(define bm : (Instance (Class #:implements Bitmap% #:implements Plot-Metrics<%>))
(make-object (plot-metrics-mixin bitmap%) (λ () (error 'pm)) (plot-width) (plot-height)))
(define dc (new bitmap-dc% [bitmap bm]))
(define plot-points (transform-points points))
(draw-lines/pen-style dc plot-points 'solid)
bm)
I tried plot-pict
too -- while creating the pict is fast, displaying the resulting pict in the REPL is just as slow (at least on my machine) -- the problem seems to be propagated to the pict drawing code.
For what is worth, I also tried using plot/dc
on a record-dc%
and this is fast (although no actual drawing is done, just recording the draw commands). Creating the bitmap separately and using plot/dc
on a bitmap-dc%
is, however slow.
@alex-hhh I believe that the pict construction is just saving a closure that then runs the expensive code, which then happens when the bitmap is draw.
If the problem is indeed a contract, then it's happening when either the bitmap or the bitmap-dc is created in plot-bitmap
. I checked the eq-hash-code
of the bitmap-dc
both when it's created (in plot-bitmap
) and when it's used (in draw-lines/pen-style
) and it's the same.
A contracted version of that object is created when constructing the new bitmap-dc%
object, but printing out the continuation marks when the construction happens is unfortunately not illuminating.
My current conclusions are:
- The fact that the same call to the
draw-lines
method takes orders of magnitude longer in one case than in another seems very likely to be caused by contracts or some other kind of indirection. - Those contracts, if they exist, are added when the dc or bitmap is constructed, not when it's passed between different parts of plot.
- The various contract introspection tools we have don't seem to tell us what's going on, but this may just be because those tools don't work correctly for certain kinds of class & object contracts.
- Trying to remove contracts also doesn't seem to change the behavior.
One path forward is to try to construct a test program that is slow in the way plot is, by removing parts of the plot library until we get something smaller.
Here's another possibility. Below are the traces (from record-dc%
) of what plot
does, and what @alex-hhh's test program does, for 10 points. It may be that one of the additional commands makes the drawing much slower (clipping-region seems like the big difference).
Plot
'((do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-font (12 #f default normal normal #f default #f aligned))
(set-smoothing unsmoothed)
(set-text-mode transparent)
(set-alpha 1.0)
(set-clipping-region #f)
(set-background (255 255 255 1.0))
(set-text-background (255 255 255 1.0))
(set-text-foreground (0 0 0 1.0))
(set-origin 0.0 0.0)
(set-smoothing smoothed)
(set-text-mode transparent)
(set-clipping-region
(#t (((((0.0 . 0.0) (400.0 . 0.0) (400.0 . 400.0) (0.0 . 400.0)))) . any)))
(set-font (11 #f roman normal normal #f default #f aligned))
(set-text-foreground (0 0 0 1.0))
(do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-background (255 255 255 1.0))
(set-alpha 1)
(set-origin 0.0 0.0)
(set-smoothing smoothed)
(set-text-mode transparent)
(set-clipping-region
(#t (((((0.0 . 0.0) (400.0 . 0.0) (400.0 . 400.0) (0.0 . 400.0)))) . any)))
(set-font (11 #f roman normal normal #f default #f aligned))
(set-text-foreground (0 0 0 1.0))
(do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-background (255 255 255 1.0))
(set-alpha 1)
(set-alpha 1)
(clear)
(set-alpha 1)
(set-clipping-region
(#t
(((((-0.5 . -0.5) (401.0 . -0.5) (401.0 . 401.0) (-0.5 . 401.0)))) . any)))
(set-alpha 1)
(do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-background (255 255 255 1.0))
(set-font (11 #f roman normal normal #f default #f aligned))
(set-text-foreground (0 0 0 1.0))
(set-alpha 1)
(do-set-pen! ((85 0 169 1.0) 1 solid round round #f))
(draw-lines
((400.0 . 59.561117537192956)
(355.55555555555554 . 184.85292816198782)
(311.11111111111114 . 84.03444981487382)
(266.66666666666663 . 77.7140553394961)
(222.22222222222223 . 400.0)
(177.77777777777777 . 260.86499463141485)
(133.33333333333331 . 0.0)
(88.88888888888889 . 45.598165111920935)
(44.44444444444444 . 365.6628432792592)
(0.0 . 244.95048306588063))
0.0
0.0)
(set-origin 0.0 0.0)
(set-smoothing smoothed)
(set-text-mode transparent)
(set-clipping-region
(#t (((((0.0 . 0.0) (400.0 . 0.0) (400.0 . 400.0) (0.0 . 400.0)))) . any)))
(set-font (11 #f roman normal normal #f default #f aligned))
(set-text-foreground (0 0 0 1.0))
(do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-background (255 255 255 1.0))
(set-alpha 1)
(set-origin 0.0 0.0)
(set-smoothing unsmoothed)
(set-text-mode transparent)
(set-clipping-region #f)
(set-font (12 #f default normal normal #f default #f aligned))
(set-text-foreground (0 0 0 1.0))
(do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-background (255 255 255 1.0))
(set-alpha 1.0))
direct
'((do-set-pen! ((0 0 0 1.0) 1 solid round round #f))
(do-set-brush! ((255 255 255 1.0) solid #f #f #f))
(set-font (12 #f default normal normal #f default #f aligned))
(set-smoothing unsmoothed)
(set-text-mode transparent)
(set-alpha 1.0)
(set-clipping-region #f)
(set-background (255 255 255 1.0))
(set-text-background (255 255 255 1.0))
(set-text-foreground (0 0 0 1.0))
(do-set-pen! ((160 32 240 1.0) 1 solid round round #f))
(draw-lines
((0 . 202.88547434793722)
(400/9 . 323.5978345613159)
(800/9 . 3.533156393977599)
(400/3 . -42.065008717943364)
(1600/9 . 218.7999859134715)
(2000/9 . 357.9349912820566)
(800/3 . 35.64904662155275)
(2800/9 . 41.96944109693046)
(3200/9 . 142.78791944404446)
(400 . 17.496108819249613))
0.0
0.0))
If I'm reading correctly that smoothing is one of the differences, that could be significant.
Smoothing seems to be about a 4x slowdown on many individual draw-line
calls and a 20x slowdown on one large draw-lines
call.
To answer the question (that I now see) about why draw-lines
is slower than separate draw-line
calls: draw-lines
joins consecutive lines with shaping of the shared point determined by a pen's join, while draw-line
just uses the pen cap on each end of each line.
Having tested this in plot, if you remove the line (send dc set-smoothing 'smoothed)
in reset-drawing-params
in plot-device.rkt
, then drawing the points with plot-bitmap
takes approximately the same amount of time as drawing directly on a bitmap-dc%
.
I'm not sure whether smoothing is needed for plot, but that seems to be the source of the overhead that this issue discusses.
Hi @samth and @mflatt , thanks for finding out the problem. I suspect smoothed drawing is slower because it involves reading back from the drawing buffer. When I did my testing, I only looked at the output for record dc for plot, and smoothing didn't jump out to me as a possible problem...
I will think about how to address this problem, as I don't think that simply disabling smoothing is a good idea -- there is a visible quality difference between a smoothed and non-smoothed plots (although not in the example showed in this issue).
This issue has been mentioned on Racket Discussions. There might be relevant details there:
https://racket.discourse.group/t/why-is-plot-slower-with-unicode-symbols-in-ticks/1611/10