trax
trax copied to clipboard
Reformer text generation notebook runs out of memory on v3-8 tpu
Description
https://github.com/google/trax/blob/master/trax/models/reformer/text_generation.ipynb
runs out of memory on a v3-8 tpu ...
Environment information
OS: <your answer here>
$ pip freeze | grep trax
trax==1.3.7
$ pip freeze | grep tensor
mesh-tensorflow==0.1.18
tensorboard==2.4.0
tensorboard-plugin-profile==2.4.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.4.0
tensorflow-addons==0.11.2
tensorflow-cpu==2.4.0
tensorflow-datasets==4.0.1
tensorflow-estimator==2.4.0
tensorflow-hub==0.10.0
tensorflow-metadata==0.26.0
tensorflow-model-optimization==0.5.0
tensorflow-serving-api==2.4.0rc4
tensorflow-text==2.4.2
$ pip freeze | grep jax
jax==0.2.7
jaxlib==0.1.57
$ python -V
Python 3.7.3
For bugs: reproduction and error logs
# Steps to reproduce:
I replaced the set up for colab with
TPU_DRIVER_MODE = 1
config.FLAGS.use_tpu = True
config.FLAGS.jax_xla_backend = 'tpu_driver'
config.FLAGS.jax_backend_target = 'grpc://10.131.60.170:8470'
The tpu is running the following software:
TPU type
v3-8
TPU software version
tpu_driver_nightly
Labels
None
Network
default
# Error logs:
FilteredStackTrace Traceback (most recent call last)
<ipython-input-41-f3e8f94c362a> in <module>
4 # so subsequent runs will be much faster than the first.
----> 5 trainer.train_epoch(n_steps=1, n_eval_steps=1)
~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_epoch(self, n_steps, n_eval_steps)
294 batch = _reshape_by_device(batch, self.n_devices)
--> 295 self.train_step(batch)
296 if self._should_save_now():
~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_step(self, batch)
329 (weights, slots), self._step, opt_params, batch,
--> 330 self._model_state, self._rngs)
331 self._opt_state = opt_state._replace(weights=weights, slots=slots)
~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in update(weights_and_slots, i, opt_params, batch, state, rng)
728 return mapped_update(weights_and_slots, np.repeat(i, n_devices),
--> 729 opt_params, batch, state, rng)
730
FilteredStackTrace: RuntimeError: Resource exhausted: Ran out of memory in memory space hbm. Used 97.92G of 15.48G hbm. Exceeded hbm capacity by 82.44G.
Total hbm usage >= 98.44G:
reserved 530.00M
program 97.92G
arguments 0B
Output size 0B; shares 0B with arguments.
Program hbm requirement 97.92G:
global 260.0K
HLO temp 97.92G (89.3% utilization: Unpadded (87.45G) Padded (97.92G), 0.0% fragmentation (0B))
Largest program allocations in hbm:
1. Size: 1.00G
Operator: op_type="add_any" op_name="pmap(mapped_update)/add_any" source_file="/home/jupyter/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py" source_line=711
Shape: f32[1,524288,512]{2,1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6043 = f32[1,524288,512]{2,1,0:T(8,128)} fusion(f32[524288]{0:T(1024)} %fusion.6673, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288,512]{1,0:T(8,128)} %get-tuple-element.10835, f32[524288]{0:T(1024)} %get-t...
Allocation type: HLO temp
==========================
2. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6047 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11659, f32[512]{0:T(512)} %get-tuple-element.11660, f32[256]{0:T(256)} %get-tuple-element.11657, f32[256]{0:T(256)} %get-tuple-element.11658, f32[524288]{0:T(...
Allocation type: HLO temp
==========================
3. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6496 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %get-tuple-element.10486, f32[1,524288...
Allocation type: HLO temp
==========================
4. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6504 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %get-tuple-element.10477, f32[1,524288...
Allocation type: HLO temp
==========================
5. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6508 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %get-tuple-element.10471, f32[1,524288...
Allocation type: HLO temp
==========================
6. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6500 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %get-tuple-element.10481, f32[1,524288...
Allocation type: HLO temp
==========================
7. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6512 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %get-tuple-element.10467, f32[1,524288...
Allocation type: HLO temp
==========================
8. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6049 = (f32[524288]{0:T(1024)}, f32[524288,512]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10492, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288]{0:T(1024)} %fusion.6675, f32[524288,256...
Allocation type: HLO temp
==========================
9. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6237 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %fusion.6694, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
10. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6243 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %fusion.6696, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
11. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6255 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %fusion.6700, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
12. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6249 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %fusion.6698, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
13. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6261 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %fusion.6702, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
14. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6447 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10489, f32[256]{0:T(256)} %get-tuple-element.11657, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10239, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
15. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6434 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10485, f32[256]{0:T(256)} %get-tuple-element.11646, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10199, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
16. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6420 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10480, f32[256]{0:T(256)} %get-tuple-element.11632, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10180, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
17. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6406 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10476, f32[256]{0:T(256)} %get-tuple-element.11619, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10161, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
18. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.13790 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9279, f32[256]{0:T(256)} %get-tuple-element.11651, f32[524288,256]{1,0:T(8,128)} %get-tuple-element.10216, f32[256]{0:T...
Allocation type: HLO temp
==========================
19. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6392 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10470, f32[256]{0:T(256)} %get-tuple-element.11607, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10143, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
20. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.13507 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9281, f32[256]{0:T(256)} %get-tuple-element.11662, f32[256]{0:T(256)} %fusion.9283, f32[524288,256]{1,0:T(8,128)} %fusi...
Allocation type: HLO temp
==========================
The stack trace above excludes JAX-internal frames.
The following is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-41-f3e8f94c362a> in <module>
3 # architecture, which takes around 2 minutes. The JIT-compiled model is saved
4 # so subsequent runs will be much faster than the first.
----> 5 trainer.train_epoch(n_steps=1, n_eval_steps=1)
~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_epoch(self, n_steps, n_eval_steps)
293 if self.n_devices > 1: # TODO(lukaszkaiser): use everywhere if possible.
294 batch = _reshape_by_device(batch, self.n_devices)
--> 295 self.train_step(batch)
296 if self._should_save_now():
297 self.save_state(keep=True)
~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in train_step(self, batch)
328 (weights, slots), stat, self._model_state, self._rngs = self._jit_update_fn(
329 (weights, slots), self._step, opt_params, batch,
--> 330 self._model_state, self._rngs)
331 self._opt_state = opt_state._replace(weights=weights, slots=slots)
332 if self._should_log_now():
~/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py in update(weights_and_slots, i, opt_params, batch, state, rng)
727 def update(weights_and_slots, i, opt_params, batch, state, rng):
728 return mapped_update(weights_and_slots, np.repeat(i, n_devices),
--> 729 opt_params, batch, state, rng)
730
731 return update
~/.local/lib/python3.7/site-packages/jax/_src/traceback_util.py in reraise_with_filtered_traceback(*args, **kwargs)
137 def reraise_with_filtered_traceback(*args, **kwargs):
138 try:
--> 139 return fun(*args, **kwargs)
140 except Exception as e:
141 if not is_under_reraiser(e):
~/.local/lib/python3.7/site-packages/jax/api.py in f_pmapped(*args, **kwargs)
1529 out_axes_thunk=out_axes_thunk,
1530 name=flat_fun.__name__, donated_invars=tuple(donated_invars),
-> 1531 global_arg_shapes=tuple(global_arg_shapes_flat))
1532 return tree_unflatten(out_tree(), out)
1533
~/.local/lib/python3.7/site-packages/jax/core.py in bind(self, fun, *args, **params)
1254 def bind(self, fun, *args, **params):
1255 assert len(params['in_axes']) == len(args)
-> 1256 return call_bind(self, fun, *args, **params)
1257
1258 def process(self, trace, fun, tracers, params):
~/.local/lib/python3.7/site-packages/jax/core.py in call_bind(primitive, fun, *args, **params)
1218 tracers = map(top_trace.full_raise, args)
1219 with maybe_new_sublevel(top_trace):
-> 1220 outs = primitive.process(top_trace, fun, tracers, params)
1221 return map(full_lower, apply_todos(env_trace_todo(), outs))
1222
~/.local/lib/python3.7/site-packages/jax/core.py in process(self, trace, fun, tracers, params)
1257
1258 def process(self, trace, fun, tracers, params):
-> 1259 return trace.process_map(self, fun, tracers, params)
1260
1261 def post_process(self, trace, out_tracers, params):
~/.local/lib/python3.7/site-packages/jax/core.py in process_call(self, primitive, f, tracers, params)
596
597 def process_call(self, primitive, f, tracers, params):
--> 598 return primitive.impl(f, *tracers, **params)
599 process_map = process_call
600
~/.local/lib/python3.7/site-packages/jax/interpreters/pxla.py in xla_pmap_impl(fun, backend, axis_name, axis_size, global_axis_size, devices, name, in_axes, out_axes_thunk, donated_invars, global_arg_shapes, *args)
598 in_axes, out_axes_thunk,
599 donated_invars, global_arg_shapes,
--> 600 *abstract_args)
601 return compiled_fun(*args)
602
~/.local/lib/python3.7/site-packages/jax/linear_util.py in memoized_fun(fun, *args)
249 fun.populate_stores(stores)
250 else:
--> 251 ans = call(fun, *args)
252 cache[key] = (ans, fun.stores)
253
~/.local/lib/python3.7/site-packages/jax/interpreters/pxla.py in parallel_callable(fun, backend_name, axis_name, axis_size, global_axis_size, devices, name, in_axes, out_axes_thunk, donated_invars, global_arg_shapes, *avals)
855 )
856 compile_options.parameter_is_tupled_arguments = tuple_args
--> 857 compiled = xla.backend_compile(backend, built, compile_options)
858
859 local_arg_parts_ = local_arg_parts or [None] * len(avals)
~/.local/lib/python3.7/site-packages/jax/interpreters/xla.py in backend_compile(backend, built_c, options)
344 # we use a separate function call to ensure that XLA compilation appears
345 # separately in Python profiling results
--> 346 return backend.compile(built_c, compile_options=options)
347
348 def _execute_compiled_primitive(prim, compiled, result_handler, *args):
RuntimeError: Resource exhausted: Ran out of memory in memory space hbm. Used 97.92G of 15.48G hbm. Exceeded hbm capacity by 82.44G.
Total hbm usage >= 98.44G:
reserved 530.00M
program 97.92G
arguments 0B
Output size 0B; shares 0B with arguments.
Program hbm requirement 97.92G:
global 260.0K
HLO temp 97.92G (89.3% utilization: Unpadded (87.45G) Padded (97.92G), 0.0% fragmentation (0B))
Largest program allocations in hbm:
1. Size: 1.00G
Operator: op_type="add_any" op_name="pmap(mapped_update)/add_any" source_file="/home/jupyter/.local/lib/python3.7/site-packages/trax/supervised/trainer_lib.py" source_line=711
Shape: f32[1,524288,512]{2,1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6043 = f32[1,524288,512]{2,1,0:T(8,128)} fusion(f32[524288]{0:T(1024)} %fusion.6673, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288,512]{1,0:T(8,128)} %get-tuple-element.10835, f32[524288]{0:T(1024)} %get-t...
Allocation type: HLO temp
==========================
2. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6047 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11659, f32[512]{0:T(512)} %get-tuple-element.11660, f32[256]{0:T(256)} %get-tuple-element.11657, f32[256]{0:T(256)} %get-tuple-element.11658, f32[524288]{0:T(...
Allocation type: HLO temp
==========================
3. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6496 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %get-tuple-element.10486, f32[1,524288...
Allocation type: HLO temp
==========================
4. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6504 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %get-tuple-element.10477, f32[1,524288...
Allocation type: HLO temp
==========================
5. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6508 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %get-tuple-element.10471, f32[1,524288...
Allocation type: HLO temp
==========================
6. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6500 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %get-tuple-element.10481, f32[1,524288...
Allocation type: HLO temp
==========================
7. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6512 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %get-tuple-element.10467, f32[1,524288...
Allocation type: HLO temp
==========================
8. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6049 = (f32[524288]{0:T(1024)}, f32[524288,512]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10492, f32[512]{0:T(512)} %get-tuple-element.11663, f32[512]{0:T(512)} %fusion.8662, f32[524288]{0:T(1024)} %fusion.6675, f32[524288,256...
Allocation type: HLO temp
==========================
9. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6237 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11598, f32[256]{0:T(256)} %get-tuple-element.11597, f32[256]{0:T(256)} %get-tuple-element.11596, f32[524288]{0:T(1024)} %fusion.6694, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
10. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6243 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11609, f32[256]{0:T(256)} %get-tuple-element.11608, f32[256]{0:T(256)} %get-tuple-element.11607, f32[524288]{0:T(1024)} %fusion.6696, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
11. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6255 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11634, f32[256]{0:T(256)} %get-tuple-element.11633, f32[256]{0:T(256)} %get-tuple-element.11632, f32[524288]{0:T(1024)} %fusion.6700, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
12. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6249 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11621, f32[256]{0:T(256)} %get-tuple-element.11620, f32[256]{0:T(256)} %get-tuple-element.11619, f32[524288]{0:T(1024)} %fusion.6698, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
13. Size: 1.00G
Operator: op_name="DUMMY_47"
Shape: f32[524288,512]{1,0:T(8,128)}
Unpadded size: 1.00G
XLA label: %fusion.6261 = f32[524288,512]{1,0:T(8,128)} fusion(f32[256,512]{1,0:T(8,128)} %get-tuple-element.11648, f32[256]{0:T(256)} %get-tuple-element.11647, f32[256]{0:T(256)} %get-tuple-element.11646, f32[524288]{0:T(1024)} %fusion.6702, f32[1,524288,256]{2,1,0:...
Allocation type: HLO temp
==========================
14. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6447 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10489, f32[256]{0:T(256)} %get-tuple-element.11657, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10239, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
15. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6434 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10485, f32[256]{0:T(256)} %get-tuple-element.11646, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10199, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
16. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6420 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10480, f32[256]{0:T(256)} %get-tuple-element.11632, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10180, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
17. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6406 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10476, f32[256]{0:T(256)} %get-tuple-element.11619, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10161, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
18. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.13790 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9279, f32[256]{0:T(256)} %get-tuple-element.11651, f32[524288,256]{1,0:T(8,128)} %get-tuple-element.10216, f32[256]{0:T...
Allocation type: HLO temp
==========================
19. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.6392 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[524288]{0:T(1024)} %get-tuple-element.10470, f32[256]{0:T(256)} %get-tuple-element.11607, f32[1,524288,256]{2,1,0:T(8,128)} %get-tuple-element.10143, f32[524288]{0:T(1024)} %...
Allocation type: HLO temp
==========================
20. Size: 512.00M
Operator: op_name="DUMMY_47"
Shape: f32[524288,256]{1,0:T(8,128)}
Unpadded size: 512.00M
XLA label: %fusion.13507 = (f32[524288]{0:T(1024)}, f32[524288,256]{1,0:T(8,128)}, f32[524288,256]{1,0:T(8,128)}) fusion(f32[256]{0:T(256)} %fusion.9281, f32[256]{0:T(256)} %get-tuple-element.11662, f32[256]{0:T(256)} %fusion.9283, f32[524288,256]{1,0:T(8,128)} %fusi...
Allocation type: HLO temp
==========================