gonum
gonum copied to clipboard
mat: format complex dense matrices
Please take a look.
This PR adds a CFormat
function in gonum/mat to format complex dense matrices; this function is intended to complement the current Format
function. Relevant issue #641 and a relevant PR #1414, although that PR appears to have stalled. Further, PR #1414 takes the approach of wholesale duplicating format.go
, which may not be preferred.
Summary:
Following https://github.com/gonum/gonum/pull/1414#issuecomment-666282217, an objective was to avoid code duplication while implementing a formatter for complex matrices. To summarize the proposed approach: value formatters are added for formatting floats and complex numbers, which are then used during matrix formatting. Since these value formatters satisfy the fmt.Formatter
interface, fmt.State
can be passed, offering simplification in several areas (e.g., re-construction of a format string is no longer necessary). As it would be inefficient to spin up a a new formatter for each element in a matrix, a value formatter holds a buffer for reuse.
To tackle widthing, a state
type is proposed (satisfying fmt.State
) that can be fit to a matrix and that stores information on column widths, using the existing widther type. As before, iterating through matrix elements is required. After a call to At
, a state
will respond to subsequent Width
calls with the relevant column width. Recommendations toward improving implementation are appreciated.
Notwithstanding the need for additional internal types, this feels simpler to me on the whole and I believe this approach should make extending to new syntaxes or data types easier, if desired. Thanks for reviewing -- will address feedback/comments.
Bench:
I've included a few benchmarks below. These compare the current and proposed implementation when formatting 10x10, 100x100 and 1000x1000 dense matrices.
implementation | benchmark | iter | time | bytes | alloc |
---|---|---|---|---|---|
current | BenchmarkFormat10 | 36134 | 33063 ns/op | 2693 B/op | 16 allocs/op |
current | BenchmarkFormat100 | 358 | 3747403 ns/op | 1672025 B/op | 137 allocs/op |
current | BenchmarkFormat1000 | 3 | 334034817 ns/op | 163557800 B/op | 1066 allocs/op |
proposed | BenchmarkFormat10 | 22086 | 54554 ns/op | 2886 B/op | 19 allocs/op |
proposed | BenchmarkFormat100 | 170 | 7072273 ns/op | 1670752 B/op | 49 allocs/op |
proposed | BenchmarkFormat1000 | 2 | 697675398 ns/op | 163542896 B/op | 80 allocs/op |
Codecov Report
Merging #1707 (5e086de) into master (85ca896) will increase coverage by
0.03%
. The diff coverage is92.59%
.
:exclamation: Current head 5e086de differs from pull request most recent head 6ce02c8. Consider uploading reports for the commit 6ce02c8 to get more accurate results
@@ Coverage Diff @@
## master #1707 +/- ##
==========================================
+ Coverage 73.26% 73.29% +0.03%
==========================================
Files 510 511 +1
Lines 60132 60152 +20
==========================================
+ Hits 44056 44090 +34
+ Misses 13584 13581 -3
+ Partials 2492 2481 -11
Impacted Files | Coverage Δ | |
---|---|---|
mat/format_value.go | 84.41% <84.41%> (ø) |
|
mat/format.go | 98.67% <98.21%> (+8.88%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 85ca896...6ce02c8. Read the comment docs.
just a drive-by comment:
could you run the benchmarks before/after the proposed optimization a couple of times (say 10-20) and show the result of benchstat
?
with the limited statistics you presented in the commit message, I get this rather poor statistical power result:
$> benchstat ./ref.txt ./new.txt
name old time/op new time/op delta
Format10 33.1µs ± 0% 54.6µs ± 0% ~ (p=1.000 n=1+1)
Format100 3.75ms ± 0% 7.07ms ± 0% ~ (p=1.000 n=1+1)
Format1000 334ms ± 0% 698ms ± 0% ~ (p=1.000 n=1+1)
name old alloc/op new alloc/op delta
Format10 2.69kB ± 0% 2.89kB ± 0% ~ (p=1.000 n=1+1)
Format100 1.67MB ± 0% 1.67MB ± 0% ~ (p=1.000 n=1+1)
Format1000 164MB ± 0% 164MB ± 0% ~ (p=1.000 n=1+1)
name old allocs/op new allocs/op delta
Format10 16.0 ± 0% 19.0 ± 0% ~ (p=1.000 n=1+1)
Format100 137 ± 0% 49 ± 0% ~ (p=1.000 n=1+1)
Format1000 1.07k ± 0% 0.08k ± 0% ~ (p=1.000 n=1+1)
ie:
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee ref.txt
## apply optimizations
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee new.txt
$> benchstat ./ref.txt ./new.txt
thanks.
PS: benchstat
is available here:
@wamuir Thank you for doing this. I've only taken a brief look, but it looks like nice work so far. I hope to be able to have a deeper look over the weekend.
just a drive-by comment: could you run the benchmarks before/after the proposed optimization a couple of times (say 10-20) and show the result of
benchstat
? with the limited statistics you presented in the commit message, I get this rather poor statistical power result:$> benchstat ./ref.txt ./new.txt name old time/op new time/op delta Format10 33.1µs ± 0% 54.6µs ± 0% ~ (p=1.000 n=1+1) Format100 3.75ms ± 0% 7.07ms ± 0% ~ (p=1.000 n=1+1) Format1000 334ms ± 0% 698ms ± 0% ~ (p=1.000 n=1+1) name old alloc/op new alloc/op delta Format10 2.69kB ± 0% 2.89kB ± 0% ~ (p=1.000 n=1+1) Format100 1.67MB ± 0% 1.67MB ± 0% ~ (p=1.000 n=1+1) Format1000 164MB ± 0% 164MB ± 0% ~ (p=1.000 n=1+1) name old allocs/op new allocs/op delta Format10 16.0 ± 0% 19.0 ± 0% ~ (p=1.000 n=1+1) Format100 137 ± 0% 49 ± 0% ~ (p=1.000 n=1+1) Format1000 1.07k ± 0% 0.08k ± 0% ~ (p=1.000 n=1+1)
ie:
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee ref.txt ## apply optimizations $> go test -run=NONE -bench=Format -benchmem -count=20 |& tee new.txt $> benchstat ./ref.txt ./new.txt
thanks.
PS:
benchstat
is available here:* [golang.org/x/perf/benchstat](https://golang.org/x/perf/benchstat)
Thanks @sbinet, I wasn't aware of benchstat. Cool. Output is below. I added -benchtime=1x
, because this appears to otherwise perform tests using the pooled mean? If true, I'd think the dispersion of time/op (±) is otherwise understated. I guess that's a nit on go test
for not having a flag to output variance (or a nit on me for not being able to find it).
name | old time/op | new time/op | delta |
---|---|---|---|
Format10 | 44.2µs ±17% | 73.9µs ±38% | +67.07% (p=0.000 n=20+18) |
Format100 | 3.28ms ± 3% | 7.55ms ± 5% | +130.21% (p=0.000 n=20+18) |
Format1000 | 329ms ± 1% | 764ms ± 2% | +132.17% (p=0.000 n=20+20) |
Increase in clock time is statistically significant. @kortschak provided review comments, which I'll work through over the weekend and then rerun the bench. One of these comments is to use micro-benchmarking and I expect this will offer greater insight into differences. Addressing his comments may also close the current gap (e.g., his comment regarding fmt.Fprint and reflect).
Most recent commit addresses review comments, except for the switch to sub-benchmarks. I'll need to push a second commit to update the benchmarks. I attempted some further simplification of the format functions and I think this was productive. I also added comments along the way. Updated benchstat output is below.
name | old time/op | new time/op | delta |
---|---|---|---|
Format10 | 44.9µs ±10% | 62.9µs ±31% | +40.08% (p=0.000 n=18+20) |
Format100 | 3.31ms ± 1% | 4.27ms ± 1% | +29.20% (p=0.000 n=18+20) |
Format1000 | 326ms ± 1% | 426ms ± 1% | +30.41% (p=0.000 n=20+20) |
name | old time/op | new time/op | delta |
---|---|---|---|
Format/10/General | 52.6µs ±17% | 68.8µs ±18% | +30.74% (p=0.000 n=19+19) |
Format/10/DotByte | 49.5µs ±21% | 67.9µs ±17% | +37.19% (p=0.000 n=19+20) |
Format/10/Excerpt | 50.3µs ±11% | 66.1µs ±21% | +31.40% (p=0.000 n=19+20) |
Format/10/Prefix | 49.4µs ±10% | 66.4µs ±17% | +34.54% (p=0.000 n=17+20) |
Format/10/Squeeze | 53.0µs ±22% | 69.7µs ±14% | +31.49% (p=0.000 n=20+20) |
Format/10/MATLAB | 51.5µs ±12% | 66.4µs ±12% | +28.79% (p=0.000 n=20+19) |
Format/10/MATLAB# | 43.0µs ±18% | 42.9µs ±23% | ~ (p=0.967 n=20+20) |
Format/10/Python | 50.4µs ±13% | 64.0µs ± 9% | +26.99% (p=0.000 n=20+19) |
Format/10/Python# | 42.4µs ±37% | 43.1µs ±30% | ~ (p=0.757 n=20+20) |
Format/100/General | 3.13ms ± 7% | 4.43ms ± 9% | +41.42% (p=0.000 n=17+20) |
Format/100/DotByte | 3.94ms ± 1% | 4.35ms ± 8% | +10.32% (p=0.000 n=19+20) |
Format/100/Excerpt | 4.00ms ± 1% | 4.54ms ± 6% | +13.45% (p=0.000 n=20+20) |
Format/100/Prefix | 4.00ms ± 1% | 5.61ms ± 1% | +40.23% (p=0.000 n=20+16) |
Format/100/Squeeze | 4.42ms ± 1% | 5.35ms ±18% | +21.01% (p=0.000 n=14+20) |
Format/100/MATLAB | 4.68ms ±15% | 6.46ms ±13% | +37.92% (p=0.000 n=20+20) |
Format/100/MATLAB# | 2.91ms ± 3% | 2.90ms ± 2% | ~ (p=0.247 n=20+20) |
Format/100/Python | 6.33ms ±16% | 6.38ms ±35% | ~ (p=0.890 n=20+20) |
Format/100/Python# | 3.56ms ±13% | 1.69ms ± 1% | -52.45% (p=0.000 n=20+19) |
Format/1000/General | 306ms ± 0% | 405ms ± 0% | +32.08% (p=0.000 n=19+20) |
Format/1000/DotByte | 303ms ± 0% | 392ms ± 0% | +29.54% (p=0.000 n=20+20) |
Format/1000/Excerpt | 308ms ± 1% | 405ms ± 0% | +31.76% (p=0.000 n=20+20) |
Format/1000/Prefix | 307ms ± 0% | 404ms ± 0% | +31.67% (p=0.000 n=20+20) |
Format/1000/Squeeze | 307ms ± 0% | 405ms ± 0% | +31.83% (p=0.000 n=20+20) |
Format/1000/MATLAB | 307ms ± 0% | 405ms ± 0% | +31.99% (p=0.000 n=19+20) |
Format/1000/MATLAB# | 164ms ± 1% | 164ms ± 1% | ~ (p=0.561 n=19+19) |
Format/1000/Python | 308ms ± 1% | 406ms ± 1% | +31.86% (p=0.000 n=20+18) |
Format/1000/Python# | 164ms ± 1% | 164ms ± 1% | ~ (p=0.604 n=20+19) |
[Geo mean] | 3.74ms | 4.43ms | +18.29% |
This PR has gone a bit stale. @wamuir are you still interested in submitting this change?
@vladimir-ch Yes. I think this needs another review. I can make revisions.