gddo icon indicating copy to clipboard operation
gddo copied to clipboard

Potential Enhancement: More accurate value for string literal size.

Open dmitshur opened this issue 8 years ago • 1 comments

gddo does not display large string literals, replacing them with an empty string literal and adding a comment:

/* %d byte string literal not displayed */

The way the value of %d is calculated is by taking the len of the string literal... string. However, that's not the actual byte size of the string literal, it's the byte size of its representation. See here:

https://github.com/golang/gddo/blob/df079801282548afb031b54664793f4052e3ea7c/doc/code.go#L207

For most strings, it might be relatively close, usually off by 2 bytes because of the " symbols on both ends. But for some interpreted strings that contain escape sequences, etc., it can be off by a significant amount.

I think it might be nicer to display the actual string size in bytes. To do that, the string literal string needs to be parsed with strconv.Unquote, and return len of the result. It can be implemented more efficiently without using strconv.Unquote.

dmitshur avatar Apr 04 '16 03:04 dmitshur

Is there still interest to fix this?

I broke out some tests and benchmarks and it looks like strconv.Unquote will both catch all the crazy unquoting issues one could conceive, and not incur too much of a perf hit:

goos: linux
goarch: amd64
pkg: gist.github.com/de5f70a868c8de8c0d44d6f4061d302c.git
BenchmarkLen10-4                30000000                48.3 ns/op
BenchmarkLen100-4               10000000               320 ns/op
BenchmarkLen1000-4               1000000              1584 ns/op
BenchmarkLen10000-4               200000              7780 ns/op
BenchmarkRawLen10-4             30000000                50.4 ns/op
BenchmarkRawLen100-4            10000000               499 ns/op
BenchmarkRawLen1000-4            1000000              1293 ns/op
BenchmarkRawLen10000-4            300000             30803 ns/op
PASS
ok      gist.github.com/de5f70a868c8de8c0d44d6f4061d302c.git    27.392s

You will notice that I test two implementations:

  • long.Len: is a trivial implementation
  • long.RawLen: is a copy of strconv.Unquote that tracks count instead of contents
  • len: was not tested as it took <1 ns/op

Am I misunderstanding your optimization or just fighting against the compiler on this one? And based on the results, do you think a 1-2 line change to introduce strconv.Unquote is worthwhile?

It could also be worthwhile making Len context.Context aware, so we can bail early if a package decided to embed a databags, with say rice.

urandom2 avatar Apr 04 '19 11:04 urandom2