jaxopt
jaxopt copied to clipboard
poor GPU utilization on the deep learning examples
when running the deep learning examples (say) deep_learning/flax_image_classif.py , the GPU utilization is never above 5%, while for the equivalent flax example the GPU utilization is around 90%, and the example runs more than 20x faster.
My guess is that there's a crucial @jax.jit directive missing somewhere.
When FLAGS.manual_loop=True we should also call a jax.jit(solver.update) method instead of solver.update; this implies making the solver class hashable.
Thanks @Algue-Rythme !
That almost works. Unfortunately, to make it work in flax_image_classif.py I need also to remove the pre_update=print_accuracy argument of the solver, as otherwise it crashes with this exception:
Traceback (most recent call last):
File "examples/deep_learning/flax_image_classif.py", line 199, in <module>
app.run(main)
File "/home/pedregosa/anaconda3/lib/python3.8/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/pedregosa/anaconda3/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "examples/deep_learning/flax_image_classif.py", line 188, in main
params, state = jax.jit(solver.update)(params=params, state=state,
File "/home/pedregosa/dev/jaxopt/jaxopt/_src/optax_wrapper.py", line 120, in update
params, state = self.pre_update(params, state, *args, **kwargs)
File "examples/deep_learning/flax_image_classif.py", line 145, in print_accuracy
if state.iter_num % 10 == 0:
jax._src.errors.ConcretizationTypeError: Abstract tracer value encountered where concrete value is expected: Traced<ShapedArray(bool[], weak_type=True)>with<DynamicJaxprTrace(level=0/1)>
The problem arose with the `bool` function.
While tracing the function update at /home/pedregosa/dev/jaxopt/jaxopt/_src/optax_wrapper.py:104 for jit, this concrete value was not available in Python because it depends on the value of the argument 'state'.
See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.ConcretizationTypeError
I tried to protect the problmatic lines with a with jax.disable_jit(): statement but it still failed
Solved for the flax_resnet.py example in #119 . Leaving this issue open since there are other examples where the GPU utilization is poor