Tables.jl icon indicating copy to clipboard operation
Tables.jl copied to clipboard

Calling `Tables.istable` is too slow if it's not actually a table

Open rafaqz opened this issue 2 years ago • 1 comments

Calling Tables.istable ends up at hasmethod as a fallback. (via TableTraits.jl and IteratorInterfaceExtensions.jl)

This is super slow. It can be seen as the big chunk in blue/green in this flame graph. The rest of the graph is rasterizing a whole vector of polygons, which in this case is not a table.

2023-01-22-224421_1920x1080

Is there a way to check if something is a table without this fallback?

The use case here is the object can be a table with the target column (geometries) and some extra columns we may use, or just some iterator of geometries. I dont know a way to separate these other than istable - but the iterators get this overhead because they are not tables.

rafaqz avatar Jan 22 '23 21:01 rafaqz

The problem here seems to be that I'm iterating over an object and each of its contents is passed to a method that again checks istable.

~It benchmarks at 160ns per call, which is not much at all if you just do it once. Its just too much to be used as a guard when iterating over lots of things. I had assumed it was a type level check so it would essentially be free.~

Actually istable was taking 40 μs per call inside the function! I'm not sure why it benchmarks faster in the REPL or what the interaction is.

My solution is to use GeoInterface.jl traits to filter objects first, because they are compile time traits. Some iterators will still be slow. It seems that if multiple packages had traits that check if methods exist this would start to be a problem.

Some benchmarks (of rasterize in Rasters.jl). This run has 4 calls to istable, always false:

julia> @benchmark rasterize(sum, $polygons; res=5, fill=1, boundary=:center)
BenchmarkTools.Trial: 8682 samples with 1 evaluation.
 Range (min … max):  489.019 μs … 109.038 ms  ┊ GC (min … max): 0.00% … 69.94%
 Time  (median):     513.876 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   578.621 μs ±   2.509 ms  ┊ GC (mean ± σ):  7.34% ±  1.68%

                        ▁▆█▅▂                     ▂▄             
  ▁▁▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▁▁▁▁▃█████▅▄▃▄▅▅▄▃▃▂▃▃▃▃▃▃▂▂▂▂▅██▇▃▂▂▂▂▃▂▂▁▁ ▃
  489 μs           Histogram: frequency by time          541 μs <

 Memory estimate: 39.75 KiB, allocs estimate: 955.

Putting istable checks last so they aren't called in this case:

julia> @benchmark rasterize(sum, $polygons; res=5, fill=1, boundary=:center)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  149.697 μs … 120.447 ms  ┊ GC (min … max):  0.00% … 70.26%
 Time  (median):     156.504 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   193.400 μs ±   2.044 ms  ┊ GC (mean ± σ):  12.80% ±  1.21%

                  ▄▇██▆▃▁                                        
  ▁▁▁▁▂▂▁▁▁▁▁▁▂▃▄████████▇▅▄▃▃▂▂▂▂▃▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  150 μs           Histogram: frequency by time          170 μs <

 Memory estimate: 33.25 KiB, allocs estimate: 827.

Seems more like 80 μs per call.

This is on Julia 1.9.0-beta-2, with Tables.jl v1.10.0

Edit: there was one more istable left above, now actually with none:

julia> @benchmark rasterize(sum, $polygons; res=5, fill=1, boundary=:center)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   86.809 μs … 76.460 ms  ┊ GC (min … max):  0.00% … 65.67%
 Time  (median):      91.758 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   119.805 μs ±  1.306 ms  ┊ GC (mean ± σ):  12.39% ±  1.14%

  ▂▇█▇▅▂▁▂▃▃▃▃▃▃▄▃▃▂▂▁▁▁▁▁▁▁▂▁   ▂▁  ▁▂▂ ▂▄▄▄▄▃▂▂▂▂▁▁          ▂
  █████████████████████████████▇████████████████████████▇▇▆▅▅▅ █
  86.8 μs       Histogram: log(frequency) by time       124 μs <

 Memory estimate: 33.25 KiB, allocs estimate: 827.

rafaqz avatar Jan 22 '23 22:01 rafaqz