TableTool icon indicating copy to clipboard operation
TableTool copied to clipboard

Performance Optimizations

Open zadr opened this issue 6 years ago • 2 comments

I've got a couple really big (20mb-80mb) CSV files that I need to open, and TableTool was a bit slow to load things. Spent an hour or two in Instruments.app and fixed some low-hanging fruit, and roughly grouped into two categories: CPU-savings and Memory-savings

All tests pass, so I don't think anything broke.

[ed: rough benchmarks discussed below were run on a MacBook Pro (13-inch, 2017, Four Thunderbolt 3 Ports) with a 3.3 GHz Intel Core i5 and 16 GB 2133 MHz LPDDR3)] [ed2: I didn't look as closely but this cuts the time it takes to open the FL_insurance_sample.csv.zip file from #58 in half, from ~2.5s to ~1.25s]

CPU

  • edc0504322893599e53a9f6005a23cddc3d0bb89 makes less temporary strings and instead tracks string ranges; this is a bulk of the improvement. Estimated time for opening an 80mb CSV file goes from 25s to 8.5s
  • ae3f4ff86d72fb476807071dba5c4ad240e91cce stop using NSScanner and use a custom CSVScanner value type instead; TableTool isn't using any of the convenience that NSScanner has, and NSScanner does a lot under the hood to try and prepare for that (e.g.: with localization). Estimated time for opening an 80mb CSV file goes from 8.5s to 5.5s
  • e12e1569490ab2d2810478545cc122352e323f8d track the C data string in CSVReader once and reference it instead of using characterAtIndex:. Estimated time for opening an 80mb CSV file goes from 5.25s to 4.5s
  • b4ef60d5c9c93aac4a063ebec277043cd5ff7cd5 Cache the string length in CSVScanner; NSString is immutable and with an 80mb CSV file, this call adds up. Estimated time for opening an 80mb CSV file goes from 5.5s to 5.25s. Bit of a microptimization but still notable enough.

Memory

  • f369f3063ccf2ec7b9cdf7d9dc5b085b3c6d9a96 add an inner autoreleasepool around regex matching. NSRegularExpression uses icu under the hood which allocates a lot of things, but they don't need to live for very long. Estimated memory optimizations impact: reduced peak memory usage for opening the 80mb CSV file from ~2.15GB to ~1.25gb.

Both

  • 97e3e5de1b8360059b61460b62e1be8ab9609279 re-use CSVReaders that the heuristic creates when scanning the document; this avoids data copies and re-doing work to create new NSString objects in memory. Estimated memory optimizations impact: reduced peak memory usage for opening the 80mb CSV file from ~1.25GB to ~925gb. Estimated time impact: reducing memory overhead saves another second and the 80mb CSV file takes ~3.5s to opening

zadr avatar Dec 30 '18 04:12 zadr

don't mean to rush things by any means, but: ping? anyone have thoughts on this bit of code?

zadr avatar Jan 24 '19 17:01 zadr

Hi!

Thanks for the pull request, this is pretty nice! I'm sorry that I didn't have any time to review this sooner, I've been pretty busy lately.

These are a lot of changes at once, so it'll take me a while to review in detail.

We probably can't use commit e12e156 -- the characterIndices are not the same as byte indices. It will only work for ASCII characters. For most other encodings (UTF-8, UTF-16) etc it will break.

jakob avatar Feb 05 '19 16:02 jakob