gonum icon indicating copy to clipboard operation
gonum copied to clipboard

mat: format complex dense matrices

Open wamuir opened this issue 3 years ago • 8 comments

Please take a look.

This PR adds a CFormat function in gonum/mat to format complex dense matrices; this function is intended to complement the current Format function. Relevant issue #641 and a relevant PR #1414, although that PR appears to have stalled. Further, PR #1414 takes the approach of wholesale duplicating format.go, which may not be preferred.

Summary:

Following https://github.com/gonum/gonum/pull/1414#issuecomment-666282217, an objective was to avoid code duplication while implementing a formatter for complex matrices. To summarize the proposed approach: value formatters are added for formatting floats and complex numbers, which are then used during matrix formatting. Since these value formatters satisfy the fmt.Formatter interface, fmt.State can be passed, offering simplification in several areas (e.g., re-construction of a format string is no longer necessary). As it would be inefficient to spin up a a new formatter for each element in a matrix, a value formatter holds a buffer for reuse.

To tackle widthing, a state type is proposed (satisfying fmt.State) that can be fit to a matrix and that stores information on column widths, using the existing widther type. As before, iterating through matrix elements is required. After a call to At, a state will respond to subsequent Width calls with the relevant column width. Recommendations toward improving implementation are appreciated.

Notwithstanding the need for additional internal types, this feels simpler to me on the whole and I believe this approach should make extending to new syntaxes or data types easier, if desired. Thanks for reviewing -- will address feedback/comments.

Bench:

I've included a few benchmarks below. These compare the current and proposed implementation when formatting 10x10, 100x100 and 1000x1000 dense matrices.

implementation benchmark iter time bytes alloc
current BenchmarkFormat10 36134 33063 ns/op 2693 B/op 16 allocs/op
current BenchmarkFormat100 358 3747403 ns/op 1672025 B/op 137 allocs/op
current BenchmarkFormat1000 3 334034817 ns/op 163557800 B/op 1066 allocs/op
proposed BenchmarkFormat10 22086 54554 ns/op 2886 B/op 19 allocs/op
proposed BenchmarkFormat100 170 7072273 ns/op 1670752 B/op 49 allocs/op
proposed BenchmarkFormat1000 2 697675398 ns/op 163542896 B/op 80 allocs/op

wamuir avatar Aug 12 '21 15:08 wamuir

Codecov Report

Merging #1707 (5e086de) into master (85ca896) will increase coverage by 0.03%. The diff coverage is 92.59%.

:exclamation: Current head 5e086de differs from pull request most recent head 6ce02c8. Consider uploading reports for the commit 6ce02c8 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1707      +/-   ##
==========================================
+ Coverage   73.26%   73.29%   +0.03%     
==========================================
  Files         510      511       +1     
  Lines       60132    60152      +20     
==========================================
+ Hits        44056    44090      +34     
+ Misses      13584    13581       -3     
+ Partials     2492     2481      -11     
Impacted Files Coverage Δ
mat/format_value.go 84.41% <84.41%> (ø)
mat/format.go 98.67% <98.21%> (+8.88%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 85ca896...6ce02c8. Read the comment docs.

codecov-commenter avatar Aug 13 '21 07:08 codecov-commenter

just a drive-by comment: could you run the benchmarks before/after the proposed optimization a couple of times (say 10-20) and show the result of benchstat? with the limited statistics you presented in the commit message, I get this rather poor statistical power result:

$> benchstat ./ref.txt ./new.txt 
name        old time/op    new time/op    delta
Format10      33.1µs ± 0%    54.6µs ± 0%   ~     (p=1.000 n=1+1)
Format100     3.75ms ± 0%    7.07ms ± 0%   ~     (p=1.000 n=1+1)
Format1000     334ms ± 0%     698ms ± 0%   ~     (p=1.000 n=1+1)

name        old alloc/op   new alloc/op   delta
Format10      2.69kB ± 0%    2.89kB ± 0%   ~     (p=1.000 n=1+1)
Format100     1.67MB ± 0%    1.67MB ± 0%   ~     (p=1.000 n=1+1)
Format1000     164MB ± 0%     164MB ± 0%   ~     (p=1.000 n=1+1)

name        old allocs/op  new allocs/op  delta
Format10        16.0 ± 0%      19.0 ± 0%   ~     (p=1.000 n=1+1)
Format100        137 ± 0%        49 ± 0%   ~     (p=1.000 n=1+1)
Format1000     1.07k ± 0%     0.08k ± 0%   ~     (p=1.000 n=1+1)

ie:

$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee ref.txt
## apply optimizations
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee new.txt
$> benchstat ./ref.txt ./new.txt

thanks.

PS: benchstat is available here:

sbinet avatar Aug 13 '21 07:08 sbinet

@wamuir Thank you for doing this. I've only taken a brief look, but it looks like nice work so far. I hope to be able to have a deeper look over the weekend.

kortschak avatar Aug 13 '21 07:08 kortschak

just a drive-by comment: could you run the benchmarks before/after the proposed optimization a couple of times (say 10-20) and show the result of benchstat? with the limited statistics you presented in the commit message, I get this rather poor statistical power result:

$> benchstat ./ref.txt ./new.txt 
name        old time/op    new time/op    delta
Format10      33.1µs ± 0%    54.6µs ± 0%   ~     (p=1.000 n=1+1)
Format100     3.75ms ± 0%    7.07ms ± 0%   ~     (p=1.000 n=1+1)
Format1000     334ms ± 0%     698ms ± 0%   ~     (p=1.000 n=1+1)

name        old alloc/op   new alloc/op   delta
Format10      2.69kB ± 0%    2.89kB ± 0%   ~     (p=1.000 n=1+1)
Format100     1.67MB ± 0%    1.67MB ± 0%   ~     (p=1.000 n=1+1)
Format1000     164MB ± 0%     164MB ± 0%   ~     (p=1.000 n=1+1)

name        old allocs/op  new allocs/op  delta
Format10        16.0 ± 0%      19.0 ± 0%   ~     (p=1.000 n=1+1)
Format100        137 ± 0%        49 ± 0%   ~     (p=1.000 n=1+1)
Format1000     1.07k ± 0%     0.08k ± 0%   ~     (p=1.000 n=1+1)

ie:

$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee ref.txt
## apply optimizations
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee new.txt
$> benchstat ./ref.txt ./new.txt

thanks.

PS: benchstat is available here:

* [golang.org/x/perf/benchstat](https://golang.org/x/perf/benchstat)

Thanks @sbinet, I wasn't aware of benchstat. Cool. Output is below. I added -benchtime=1x, because this appears to otherwise perform tests using the pooled mean? If true, I'd think the dispersion of time/op (±) is otherwise understated. I guess that's a nit on go test for not having a flag to output variance (or a nit on me for not being able to find it).

name old time/op new time/op delta
Format10 44.2µs ±17% 73.9µs ±38% +67.07% (p=0.000 n=20+18)
Format100 3.28ms ± 3% 7.55ms ± 5% +130.21% (p=0.000 n=20+18)
Format1000 329ms ± 1% 764ms ± 2% +132.17% (p=0.000 n=20+20)

Increase in clock time is statistically significant. @kortschak provided review comments, which I'll work through over the weekend and then rerun the bench. One of these comments is to use micro-benchmarking and I expect this will offer greater insight into differences. Addressing his comments may also close the current gap (e.g., his comment regarding fmt.Fprint and reflect).

wamuir avatar Aug 13 '21 14:08 wamuir

Most recent commit addresses review comments, except for the switch to sub-benchmarks. I'll need to push a second commit to update the benchmarks. I attempted some further simplification of the format functions and I think this was productive. I also added comments along the way. Updated benchstat output is below.

name old time/op new time/op delta
Format10 44.9µs ±10% 62.9µs ±31% +40.08% (p=0.000 n=18+20)
Format100 3.31ms ± 1% 4.27ms ± 1% +29.20% (p=0.000 n=18+20)
Format1000 326ms ± 1% 426ms ± 1% +30.41% (p=0.000 n=20+20)

wamuir avatar Aug 14 '21 15:08 wamuir

name old time/op new time/op delta
Format/10/General 52.6µs ±17% 68.8µs ±18% +30.74% (p=0.000 n=19+19)
Format/10/DotByte 49.5µs ±21% 67.9µs ±17% +37.19% (p=0.000 n=19+20)
Format/10/Excerpt 50.3µs ±11% 66.1µs ±21% +31.40% (p=0.000 n=19+20)
Format/10/Prefix 49.4µs ±10% 66.4µs ±17% +34.54% (p=0.000 n=17+20)
Format/10/Squeeze 53.0µs ±22% 69.7µs ±14% +31.49% (p=0.000 n=20+20)
Format/10/MATLAB 51.5µs ±12% 66.4µs ±12% +28.79% (p=0.000 n=20+19)
Format/10/MATLAB# 43.0µs ±18% 42.9µs ±23% ~ (p=0.967 n=20+20)
Format/10/Python 50.4µs ±13% 64.0µs ± 9% +26.99% (p=0.000 n=20+19)
Format/10/Python# 42.4µs ±37% 43.1µs ±30% ~ (p=0.757 n=20+20)
Format/100/General 3.13ms ± 7% 4.43ms ± 9% +41.42% (p=0.000 n=17+20)
Format/100/DotByte 3.94ms ± 1% 4.35ms ± 8% +10.32% (p=0.000 n=19+20)
Format/100/Excerpt 4.00ms ± 1% 4.54ms ± 6% +13.45% (p=0.000 n=20+20)
Format/100/Prefix 4.00ms ± 1% 5.61ms ± 1% +40.23% (p=0.000 n=20+16)
Format/100/Squeeze 4.42ms ± 1% 5.35ms ±18% +21.01% (p=0.000 n=14+20)
Format/100/MATLAB 4.68ms ±15% 6.46ms ±13% +37.92% (p=0.000 n=20+20)
Format/100/MATLAB# 2.91ms ± 3% 2.90ms ± 2% ~ (p=0.247 n=20+20)
Format/100/Python 6.33ms ±16% 6.38ms ±35% ~ (p=0.890 n=20+20)
Format/100/Python# 3.56ms ±13% 1.69ms ± 1% -52.45% (p=0.000 n=20+19)
Format/1000/General 306ms ± 0% 405ms ± 0% +32.08% (p=0.000 n=19+20)
Format/1000/DotByte 303ms ± 0% 392ms ± 0% +29.54% (p=0.000 n=20+20)
Format/1000/Excerpt 308ms ± 1% 405ms ± 0% +31.76% (p=0.000 n=20+20)
Format/1000/Prefix 307ms ± 0% 404ms ± 0% +31.67% (p=0.000 n=20+20)
Format/1000/Squeeze 307ms ± 0% 405ms ± 0% +31.83% (p=0.000 n=20+20)
Format/1000/MATLAB 307ms ± 0% 405ms ± 0% +31.99% (p=0.000 n=19+20)
Format/1000/MATLAB# 164ms ± 1% 164ms ± 1% ~ (p=0.561 n=19+19)
Format/1000/Python 308ms ± 1% 406ms ± 1% +31.86% (p=0.000 n=20+18)
Format/1000/Python# 164ms ± 1% 164ms ± 1% ~ (p=0.604 n=20+19)
[Geo mean] 3.74ms 4.43ms +18.29%

wamuir avatar Aug 14 '21 20:08 wamuir

This PR has gone a bit stale. @wamuir are you still interested in submitting this change?

vladimir-ch avatar Aug 01 '23 10:08 vladimir-ch

@vladimir-ch Yes. I think this needs another review. I can make revisions.

wamuir avatar Aug 01 '23 15:08 wamuir