jaxopt icon indicating copy to clipboard operation
jaxopt copied to clipboard

poor GPU utilization on the deep learning examples

Open fabianp opened this issue 4 years ago • 4 comments

when running the deep learning examples (say) deep_learning/flax_image_classif.py , the GPU utilization is never above 5%, while for the equivalent flax example the GPU utilization is around 90%, and the example runs more than 20x faster.

My guess is that there's a crucial @jax.jit directive missing somewhere.

fabianp avatar Nov 24 '21 19:11 fabianp

When FLAGS.manual_loop=True we should also call a jax.jit(solver.update) method instead of solver.update; this implies making the solver class hashable.

Algue-Rythme avatar Nov 26 '21 11:11 Algue-Rythme

Thanks @Algue-Rythme !

That almost works. Unfortunately, to make it work in flax_image_classif.py I need also to remove the pre_update=print_accuracy argument of the solver, as otherwise it crashes with this exception:

Traceback (most recent call last):
  File "examples/deep_learning/flax_image_classif.py", line 199, in <module>
    app.run(main)
  File "/home/pedregosa/anaconda3/lib/python3.8/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/pedregosa/anaconda3/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "examples/deep_learning/flax_image_classif.py", line 188, in main
    params, state = jax.jit(solver.update)(params=params, state=state,
  File "/home/pedregosa/dev/jaxopt/jaxopt/_src/optax_wrapper.py", line 120, in update
    params, state = self.pre_update(params, state, *args, **kwargs)
  File "examples/deep_learning/flax_image_classif.py", line 145, in print_accuracy
    if state.iter_num % 10 == 0:
jax._src.errors.ConcretizationTypeError: Abstract tracer value encountered where concrete value is expected: Traced<ShapedArray(bool[], weak_type=True)>with<DynamicJaxprTrace(level=0/1)>
The problem arose with the `bool` function. 
While tracing the function update at /home/pedregosa/dev/jaxopt/jaxopt/_src/optax_wrapper.py:104 for jit, this concrete value was not available in Python because it depends on the value of the argument 'state'.

See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.ConcretizationTypeError

fabianp avatar Nov 26 '21 22:11 fabianp

I tried to protect the problmatic lines with a with jax.disable_jit(): statement but it still failed

fabianp avatar Nov 26 '21 22:11 fabianp

Solved for the flax_resnet.py example in #119 . Leaving this issue open since there are other examples where the GPU utilization is poor

fabianp avatar Dec 09 '21 19:12 fabianp