trax icon indicating copy to clipboard operation
trax copied to clipboard

Reformer text generation notebook runs out of memory on v3-8 tpu

Open CalebEverett opened this issue 4 years ago • 0 comments

Description

https://github.com/google/trax/blob/master/trax/models/reformer/text_generation.ipynb

runs out of memory on a v3-8 tpu ...

Environment information

OS: <your answer here>

$ pip freeze | grep trax
trax==1.3.7

$ pip freeze | grep tensor
mesh-tensorflow==0.1.18
tensorboard==2.4.0
tensorboard-plugin-profile==2.4.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.4.0
tensorflow-addons==0.11.2
tensorflow-cpu==2.4.0
tensorflow-datasets==4.0.1
tensorflow-estimator==2.4.0
tensorflow-hub==0.10.0
tensorflow-metadata==0.26.0
tensorflow-model-optimization==0.5.0
tensorflow-serving-api==2.4.0rc4
tensorflow-text==2.4.2
$ pip freeze | grep jax
jax==0.2.7
jaxlib==0.1.57

$ python -V
Python 3.7.3

For bugs: reproduction and error logs

# Steps to reproduce:

I replaced the set up for colab with 

TPU_DRIVER_MODE = 1
config.FLAGS.use_tpu = True
config.FLAGS.jax_xla_backend = 'tpu_driver'
config.FLAGS.jax_backend_target = 'grpc://10.131.60.170:8470'

The tpu is running the following software:

TPU type
v3-8
TPU software version
tpu_driver_nightly
Labels
None
Network
default
# Error logs:

FilteredStackTrace                        Traceback (most recent call last)
<ipython-input-41-f3e8f94c362a> in <module>
      4 # so subsequent runs will be much faster than the first.
----> 5 trainer.train_epoch(n_steps=1, n_eval_steps=1)

~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_epoch(self, n_steps, n_eval_steps)
    294         batch = _reshape_by_device(batch, self.n_devices)
--> 295       self.train_step(batch)
    296       if self._should_save_now():

~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_step(self, batch)
    329         (weights, slots), self._step, opt_params, batch,
--> 330         self._model_state, self._rngs)
    331     self._opt_state = opt_state._replace(weights=weights, slots=slots)

~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in update(weights_and_slots, i, opt_params, batch, state, rng)
    728     return mapped_update(weights_and_slots, np.repeat(i, n_devices),
--> 729                          opt_params, batch, state, rng)
    730 

FilteredStackTrace: RuntimeError: Resource exhausted: Ran out of memory in memory space hbm. Used 97.92G of 15.48G hbm. Exceeded hbm capacity by 82.44G.

Total hbm usage >= 98.44G:
    reserved        530.00M 
    program          97.92G 
    arguments            0B 

Output size 0B; shares 0B with arguments.

Program hbm requirement 97.92G:
    global           260.0K
    HLO temp         97.92G (89.3% utilization: Unpadded (87.45G) Padded (97.92G), 0.0% fragmentation (0B))

  Largest program allocations in hbm:

  1. Size: 1.00G
     Operator: op_type="add_any" op_name="pmap(mapped_update)/add_any" source_file="/home/jupyter/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py" source_line=711
     Shape: f32[1,524288,512]{2,1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6043 = f32[1,524288,512]{2,1,0:T(8,128)} fusion(f32[524288]{0:T(1024)} %fusion.6673, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288,512]{1,0:T(8,128)} %get-tuple-element.10835, f32[524288]{0:T(1024)} %get-t...
     Allocation type: HLO temp
     ==========================

  2. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6047 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11659, f32[512]{0:T(512)} %get-tuple-element.11660, f32[256]{0:T(256)} %get-tuple-element.11657, f32[256]{0:T(256)} %get-tuple-element.11658, f32[524288]{0:T(...
     Allocation type: HLO temp
     ==========================

  3. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6496 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %get-tuple-element.10486, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  4. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6504 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %get-tuple-element.10477, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  5. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6508 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %get-tuple-element.10471, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  6. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6500 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %get-tuple-element.10481, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  7. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6512 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %get-tuple-element.10467, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  8. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6049 = (f32[524288]{0:T(1024)}, f32[524288,512]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10492, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288]{0:T(1024)} %fusion.6675, f32[524288,256...
     Allocation type: HLO temp
     ==========================

  9. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6237 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %fusion.6694, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  10. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6243 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %fusion.6696, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  11. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6255 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %fusion.6700, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  12. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6249 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %fusion.6698, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  13. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6261 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %fusion.6702, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  14. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6447 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10489, f32[256]{0:T(256)} %get-tuple-element.11657, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10239, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  15. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6434 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10485, f32[256]{0:T(256)} %get-tuple-element.11646, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10199, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  16. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6420 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10480, f32[256]{0:T(256)} %get-tuple-element.11632, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10180, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  17. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6406 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10476, f32[256]{0:T(256)} %get-tuple-element.11619, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10161, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  18. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.13790 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9279, f32[256]{0:T(256)} %get-tuple-element.11651, f32[524288,256]{1,0:T(8,128)} %get-tuple-element.10216, f32[256]{0:T...
     Allocation type: HLO temp
     ==========================

  19. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6392 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10470, f32[256]{0:T(256)} %get-tuple-element.11607, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10143, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  20. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.13507 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9281, f32[256]{0:T(256)} %get-tuple-element.11662, f32[256]{0:T(256)} %fusion.9283, f32[524288,256]{1,0:T(8,128)} %fusi...
     Allocation type: HLO temp
     ==========================

The stack trace above excludes JAX-internal frames.
The following is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-41-f3e8f94c362a> in <module>
      3 # architecture, which takes around 2 minutes. The JIT-compiled model is saved
      4 # so subsequent runs will be much faster than the first.
----> 5 trainer.train_epoch(n_steps=1, n_eval_steps=1)

~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_epoch(self, n_steps, n_eval_steps)
    293       if self.n_devices > 1:  # TODO(lukaszkaiser): use everywhere if possible.
    294         batch = _reshape_by_device(batch, self.n_devices)
--> 295       self.train_step(batch)
    296       if self._should_save_now():
    297         self.save_state(keep=True)

~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_step(self, batch)
    328     (weights, slots), stat, self._model_state, self._rngs = self._jit_update_fn(
    329         (weights, slots), self._step, opt_params, batch,
--> 330         self._model_state, self._rngs)
    331     self._opt_state = opt_state._replace(weights=weights, slots=slots)
    332     if self._should_log_now():

~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in update(weights_and_slots, i, opt_params, batch, state, rng)
    727   def update(weights_and_slots, i, opt_params, batch, state, rng):
    728     return mapped_update(weights_and_slots, np.repeat(i, n_devices),
--> 729                          opt_params, batch, state, rng)
    730 
    731   return update

~/.local/lib/python3.7/site-packages/jax/_src/traceback_util.py in reraise_with_filtered_traceback(*args, **kwargs)
    137   def reraise_with_filtered_traceback(*args, **kwargs):
    138     try:
--> 139       return fun(*args, **kwargs)
    140     except Exception as e:
    141       if not is_under_reraiser(e):

~/.local/lib/python3.7/site-packages/jax/api.py in f_pmapped(*args, **kwargs)
   1529         out_axes_thunk=out_axes_thunk,
   1530         name=flat_fun.__name__, donated_invars=tuple(donated_invars),
-> 1531         global_arg_shapes=tuple(global_arg_shapes_flat))
   1532     return tree_unflatten(out_tree(), out)
   1533 

~/.local/lib/python3.7/site-packages/jax/core.py in bind(self, fun, *args, **params)
   1254   def bind(self, fun, *args, **params):
   1255     assert len(params['in_axes']) == len(args)
-> 1256     return call_bind(self, fun, *args, **params)
   1257 
   1258   def process(self, trace, fun, tracers, params):

~/.local/lib/python3.7/site-packages/jax/core.py in call_bind(primitive, fun, *args, **params)
   1218   tracers = map(top_trace.full_raise, args)
   1219   with maybe_new_sublevel(top_trace):
-> 1220     outs = primitive.process(top_trace, fun, tracers, params)
   1221   return map(full_lower, apply_todos(env_trace_todo(), outs))
   1222 

~/.local/lib/python3.7/site-packages/jax/core.py in process(self, trace, fun, tracers, params)
   1257 
   1258   def process(self, trace, fun, tracers, params):
-> 1259     return trace.process_map(self, fun, tracers, params)
   1260 
   1261   def post_process(self, trace, out_tracers, params):

~/.local/lib/python3.7/site-packages/jax/core.py in process_call(self, primitive, f, tracers, params)
    596 
    597   def process_call(self, primitive, f, tracers, params):
--> 598     return primitive.impl(f, *tracers, **params)
    599   process_map = process_call
    600 

~/.local/lib/python3.7/site-packages/jax/interpreters/pxla.py in xla_pmap_impl(fun, backend, axis_name, axis_size, global_axis_size, devices, name, in_axes, out_axes_thunk, donated_invars, global_arg_shapes, *args)
    598                                    in_axes, out_axes_thunk,
    599                                    donated_invars, global_arg_shapes,
--> 600                                    *abstract_args)
    601   return compiled_fun(*args)
    602 

~/.local/lib/python3.7/site-packages/jax/linear_util.py in memoized_fun(fun, *args)
    249       fun.populate_stores(stores)
    250     else:
--> 251       ans = call(fun, *args)
    252       cache[key] = (ans, fun.stores)
    253 

~/.local/lib/python3.7/site-packages/jax/interpreters/pxla.py in parallel_callable(fun, backend_name, axis_name, axis_size, global_axis_size, devices, name, in_axes, out_axes_thunk, donated_invars, global_arg_shapes, *avals)
    855   )
    856   compile_options.parameter_is_tupled_arguments = tuple_args
--> 857   compiled = xla.backend_compile(backend, built, compile_options)
    858 
    859   local_arg_parts_ = local_arg_parts or [None] * len(avals)

~/.local/lib/python3.7/site-packages/jax/interpreters/xla.py in backend_compile(backend, built_c, options)
    344   # we use a separate function call to ensure that XLA compilation appears
    345   # separately in Python profiling results
--> 346   return backend.compile(built_c, compile_options=options)
    347 
    348 def _execute_compiled_primitive(prim, compiled, result_handler, *args):

RuntimeError: Resource exhausted: Ran out of memory in memory space hbm. Used 97.92G of 15.48G hbm. Exceeded hbm capacity by 82.44G.

Total hbm usage >= 98.44G:
    reserved        530.00M 
    program          97.92G 
    arguments            0B 

Output size 0B; shares 0B with arguments.

Program hbm requirement 97.92G:
    global           260.0K
    HLO temp         97.92G (89.3% utilization: Unpadded (87.45G) Padded (97.92G), 0.0% fragmentation (0B))

  Largest program allocations in hbm:

  1. Size: 1.00G
     Operator: op_type="add_any" op_name="pmap(mapped_update)/add_any" source_file="/home/jupyter/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py" source_line=711
     Shape: f32[1,524288,512]{2,1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6043 = f32[1,524288,512]{2,1,0:T(8,128)} fusion(f32[524288]{0:T(1024)} %fusion.6673, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288,512]{1,0:T(8,128)} %get-tuple-element.10835, f32[524288]{0:T(1024)} %get-t...
     Allocation type: HLO temp
     ==========================

  2. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6047 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11659, f32[512]{0:T(512)} %get-tuple-element.11660, f32[256]{0:T(256)} %get-tuple-element.11657, f32[256]{0:T(256)} %get-tuple-element.11658, f32[524288]{0:T(...
     Allocation type: HLO temp
     ==========================

  3. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6496 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %get-tuple-element.10486, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  4. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6504 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %get-tuple-element.10477, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  5. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6508 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %get-tuple-element.10471, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  6. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6500 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %get-tuple-element.10481, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  7. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6512 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %get-tuple-element.10467, f32[1,524288...
     Allocation type: HLO temp
     ==========================

  8. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6049 = (f32[524288]{0:T(1024)}, f32[524288,512]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10492, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288]{0:T(1024)} %fusion.6675, f32[524288,256...
     Allocation type: HLO temp
     ==========================

  9. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6237 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %fusion.6694, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  10. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6243 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %fusion.6696, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  11. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6255 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %fusion.6700, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  12. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6249 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %fusion.6698, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  13. Size: 1.00G
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,512]{1,0:T(8,128)}
     Unpadded size: 1.00G
     XLA label: %fusion.6261 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %fusion.6702, f32[1,524288,256]{2,1,0:...
     Allocation type: HLO temp
     ==========================

  14. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6447 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10489, f32[256]{0:T(256)} %get-tuple-element.11657, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10239, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  15. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6434 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10485, f32[256]{0:T(256)} %get-tuple-element.11646, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10199, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  16. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6420 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10480, f32[256]{0:T(256)} %get-tuple-element.11632, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10180, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  17. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6406 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10476, f32[256]{0:T(256)} %get-tuple-element.11619, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10161, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  18. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.13790 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9279, f32[256]{0:T(256)} %get-tuple-element.11651, f32[524288,256]{1,0:T(8,128)} %get-tuple-element.10216, f32[256]{0:T...
     Allocation type: HLO temp
     ==========================

  19. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.6392 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10470, f32[256]{0:T(256)} %get-tuple-element.11607, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10143, f32[524288]{0:T(1024)} %...
     Allocation type: HLO temp
     ==========================

  20. Size: 512.00M
     Operator: op_name="DUMMY_47"
     Shape: f32[524288,256]{1,0:T(8,128)}
     Unpadded size: 512.00M
     XLA label: %fusion.13507 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9281, f32[256]{0:T(256)} %get-tuple-element.11662, f32[256]{0:T(256)} %fusion.9283, f32[524288,256]{1,0:T(8,128)} %fusi...
     Allocation type: HLO temp
     ==========================

CalebEverett avatar Dec 29 '20 04:12 CalebEverett