baker icon indicating copy to clipboard operation
baker copied to clipboard

Proto/filter error handling

Open arl opened this issue 2 years ago • 0 comments

:question: What

Prototype of filter error handling feature.

  • Modifies the baker.Filter interface to Process(r baker.Record) error
  • Add DropOnError boolean on all Filters to control at the toml/config level whether an error implies a drop.
  • modify filter chain in topology to handle drop
  • Each filter can now declare a chain of error_handler below its own config.
  • Each FilterErrorHandler can be configured

Example: https://github.com/AdRoll/baker/blob/8864fa8737637b7cddd947212874c1010a355ca3/examples/filter_errors/topo.toml#L12-L29)

TODO
  • [ ] remove the need for filter implementation to handle the count of dropped lines (i.e. Filter.Stats.FilteredLines). The Toplogy can do it itself.
  • [ ] run a macro-benchmark on an existing topology

Benchmarks

The benchmarks run the simplest possible topology, trying to boil it down to the filterchain.

  • passthrough -> a single filter that does nothing
  • url-escape -> Simple 'pure' filter: url-escapes a field, writes the result in another. No error, no possibility of drop.
  • url-unescape -> Now the unescaping may fail, in which case the destination is cleared. No possibility of drop.
  • not-null/drop2% -> NotNull filter (check if a field is empty) and drop the record if it is, using DropOnError. 2% of records are dropped.
  • not-null/drop20% -> same but 20% of records are dropped
  • not-null/drop -> same but 100% of records are dropped
name                            old time/op    new time/op    delta
FilterChain/passthrough-8          2.13ms ± 2%    2.87ms ± 1%  +34.76%  (p=0.000 n=9+9)
FilterChain/url-escape-8          8.04ms ± 2%    8.12ms ± 3%     ~     (p=0.080 n=9+11)
FilterChain/url-unescape-8        8.15ms ± 1%   10.29ms ± 1%  +26.22%  (p=0.000 n=10+7)
FilterChain/not-null/drop-8       4.15ms ± 2%    4.14ms ± 1%     ~     (p=0.681 n=15+8)
FilterChain/not-null/drop20%-8    6.60ms ± 1%    6.76ms ± 1%   +2.34%  (p=0.000 n=10+10)
FilterChain/not-null/drop2%-8     14.7ms ± 4%    15.8ms ± 7%   +7.59%  (p=0.003 n=8+6)

name                            old alloc/op   new alloc/op   delta
FilterChain/passthrough-8          1.11MB ± 0%    1.11MB ± 0%     ~     (p=0.387 n=9+9)
FilterChain/url-escape-8          2.63MB ± 0%    2.63MB ± 0%     ~     (p=0.252 n=9+11)
FilterChain/url-unescape-8        2.64MB ± 0%    2.36MB ± 0%  -10.60%  (p=0.000 n=9+7)
FilterChain/not-null/drop-8       1.36MB ± 0%    1.36MB ± 0%   -0.01%  (p=0.003 n=14+7)
FilterChain/not-null/drop20%-8    2.10MB ± 0%    3.52MB ± 0%  +67.39%  (p=0.000 n=9+10)
FilterChain/not-null/drop2%-8     27.2MB ± 0%    27.2MB ± 0%   -0.00%  (p=0.001 n=9+7)

name                            old allocs/op  new allocs/op  delta
FilterChain/passthrough-8           3.07k ± 0%     3.07k ± 0%   -0.07%  (p=0.000 n=9+10)
FilterChain/url-escape-8           50.1k ± 0%     50.1k ± 0%   -0.00%  (p=0.002 n=8+10)
FilterChain/url-unescape-8         55.1k ± 0%     40.1k ± 0%  -27.24%  (p=0.000 n=10+7)
FilterChain/not-null/drop-8         72.0 ± 0%      69.6 ± 1%   -3.30%  (p=0.000 n=16+8)
FilterChain/not-null/drop20%-8     32.1k ± 0%     20.1k ± 0%  -37.40%  (p=0.000 n=10+10)
FilterChain/not-null/drop2%-8      20.1k ± 0%     20.1k ± 0%   -0.01%  (p=0.000 n=8+8)

note to reviewers most of the filters have been removed so that it's easy to run the prototype without converting all of them.

arl avatar Jun 21 '22 17:06 arl