alibi icon indicating copy to clipboard operation
alibi copied to clipboard

Counterfactual refactor v1

Open jklaise opened this issue 3 years ago • 10 comments

Builds on top of the old interop-refactor branch.

Goals:

  • (v0): Test and replace the current counterfactual.py module with the new modular implementation on this branch whilst simplifying some of the logic where possible.
  • (v1) Flexible CounterFactual, CounterFactualProto and CEM implementations, supporting both tensorflow and pytorch backends with possible extensions to distributed computation via ray and private class implementations (separate from the public facing API)
  • (v1-v2): Make sure the public and private APIs are suitable for both "sequential" counterfactual methods (e.g. Wachter et al.) where optimization is per-instance at explain time and "batch" counterfactuals where optimization is done on a training set at fit time.

jklaise avatar May 05 '21 13:05 jklaise

Codecov Report

Merging #403 (297cb10) into master (599f329) will decrease coverage by 8.11%. The diff coverage is 27.61%.

:exclamation: Current head 297cb10 differs from pull request most recent head bab32a0. Consider uploading reports for the commit bab32a0 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #403      +/-   ##
==========================================
- Coverage   88.45%   80.33%   -8.12%     
==========================================
  Files          58       77      +19     
  Lines        7709     8611     +902     
==========================================
+ Hits         6819     6918      +99     
- Misses        890     1693     +803     
Impacted Files Coverage Δ
alibi/explainers/backend/pytorch/counterfactual.py 0.00% <0.00%> (ø)
...i/explainers/backend/tensorflow/counterfactuals.py 0.00% <0.00%> (ø)
alibi/utils/decorators.py 0.00% <0.00%> (ø)
alibi/utils/pytorch/__init__.py 0.00% <0.00%> (ø)
alibi/utils/pytorch/logging.py 0.00% <0.00%> (ø)
alibi/utils/pytorch/wrappers.py 0.00% <0.00%> (ø)
alibi/utils/tf.py 84.31% <ø> (ø)
alibi/utils/tensorflow/gradients.py 20.00% <20.00%> (ø)
alibi/explainers/experimental/counterfactuals.py 27.16% <27.16%> (ø)
alibi/utils/logging.py 30.61% <30.61%> (ø)
... and 47 more

codecov[bot] avatar May 05 '21 13:05 codecov[bot]

Update: my current focus is simplifying the framework-specific optimizer class definition. The idea is to have a fairly generic TFGradientOptimizer handling both differentiable and non-differentiable predictor and be applicable across methods.

jklaise avatar May 07 '21 10:05 jklaise

The latest commit simplifies the handling of the backend optimizers. Now the base class TFGradientOptimizer is responsible for both black-box and white-box optimization, this allows the subclasses to be very concise:

  • set the default loss_spec if not overriden by user
  • calculate the autograd_loss

jklaise avatar May 07 '21 17:05 jklaise

There is an off-by-one error for logging loss values in the white-box case, i.e. losses for each optimizer step are logged before the gradients are applied. To get the proper loss values they need to be re-calculated after gradients are applied, but this results in an unacceptable increase in computation time.

Implementation Time to explain a particular MNIST image
Current 13.9 s ± 1.19 s per loop
New (incorrect logging) 10.8 s ± 116 ms per loop
New (correct logging) 15.4 s ± 1.79 s per loop

jklaise avatar May 10 '21 17:05 jklaise

The overhead on the correct logging comes from the fact that by default information on losses is logged at every step. If we increase the default to something higher like 10 then the problem is side-stepped, although a better solution would be ideal.

Potentially the best option might be to set 'log_traces=False' by default for fastest possible counterfactuals, unclear how many users would want to monitor the TensorBoard by default.

jklaise avatar May 11 '21 15:05 jklaise

Latest commit adds a sync parameter to the backend.collect_step_data and backend.update_state functions. The idea is to set this to True when computing loss values returned to the user which must be accurate, whilst it's False by default meaning that traces logged on tensorboard will by off-by-one for white-box models.

Not a great solution but it works...

jklaise avatar May 11 '21 16:05 jklaise

The latest commit changes how backend implementations are found by load_backend. Previously it was assumed that the backend module name would be the same as the module name of the caller (in this case caller is CounterfactualBase defined in alibi.explainers.base.counterfactuals, so load_backend would look in alibi.explainers.backend.framework.counterfactuals). This is not sufficient as the base class can be reused for other models, e.g. CEM, but then the backend loading will look in the same module and fail to find a backend implementation as we want the CEM backends to live separately (e.g. alibi.explainers.backend.framework.cem).

To enable this behaviour, each implementation class (e.g _WachterCountefactual) now has to provide a module_name class variable (e.g. counterfactuals) so that load_backend always looks in alibi.explainers.experimental.backend.framework.module_name.

This also allows us to bring alibi.explainers.base.counterfactuals to alibi.explainers.cf_base.

jklaise avatar May 12 '21 14:05 jklaise

There are some concerning inefficiencies in the search process I discovered whilst investigating the CEM integration into this framework and I believe they come from the switch from graph-based to eager execution.

As an example comparing the timings of the Counterfactual search again between the current (graph-based) and proposed (eager execution) method with no logging enabled:

Implementation Time to explain a particular MNIST image Gradient steps Time per step (avg)
Current (no logging) 7.17s ± 214ms per loop 1537 4.67ms
Proposed (no logging) 9.5s ± 175ms per loop 546 17.4ms

Note that the number of gradient steps is different as the new implementation improves efficiency in terms of numbers of steps considerably (outer loop over lambdas), however the time per gradient step is significantly higher for eager execution (~3.7x higher).

One thing we could try to leverage is @tf.function decorators to compile a graph, however this may require rethinking how we do a few things requiring numpy as under tf.function this is not possible to execute.

jklaise avatar May 18 '21 11:05 jklaise

Architecture diagram as initially proposed (note some names may have changed):

CF-structure

jklaise avatar Jul 07 '21 12:07 jklaise

The correct way to reduce the performance gap is to decorate the get_autodiff_gradients with tf.function, however, this does not actually work as expected in the first stage _initialise_lam because the value of lam does not get updated at all and the whole optimization is run with the original lam value.

We will need a different way to re-define the losses with different values at each optimization stage.

jklaise avatar Jul 28 '21 10:07 jklaise