gnn-decoder icon indicating copy to clipboard operation
gnn-decoder copied to clipboard

Issue running the code on colab

Open alferio1349 opened this issue 1 year ago • 5 comments

Detected unsupported operations when trying to compile graph ldpcbp_decoder_while_body_12659_const_0[] on XLA_CPU_JIT: RaggedRange (No registered 'RaggedRange' OpKernel for XLA_CPU_JIT devices compatible with node {{node ldpcbp_decoder/while/RaggedWhere/RaggedRange}}){{node ldpcbp_decoder/while/RaggedWhere/RaggedRange}}

 [[ldpcbp_decoder/while]]
tf2xla conversion failed while converting __inference_call_40890[_XlaMustCompile=true,config_proto=3175580994766145631,executor_type=11160318154034397263]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions. [Op:__inference_call_40890]

Call arguments received by layer 'e2e_model_1' (type E2EModel): • batch_size=tf.Tensor(shape=(), dtype=int32) • ebno_db=tf.Tensor(shape=(), dtype=float32)

alferio1349 avatar Nov 29 '24 14:11 alferio1349

Hi @alferio1349, do you have a code snippet that allows to reproduce the error? As workaround you can run the code in non-XLA mode jit_compile=False.

SebastianCa avatar Nov 29 '24 14:11 SebastianCa

snipped code: "# simulate "conventional" BP performance for given pcm bp_decoder = LDPCBPDecoder(pcm, num_iter=20, hard_out=False) e2e_bp = E2EModel(encoder, bp_decoder, k, n) ber_plot.simulate(e2e_bp, ebno_dbs=ebno_dbs, batch_size=params["mc_batch_size"], num_target_block_errors=params["num_target_block_errors"], legend=f"BP {bp_decoder._num_iter.numpy()} iter.", soft_estimates=True, max_mc_iter=params["mc_iters"], forward_keyboard_interrupt=False, show_fig=False); "

Error:

InvalidArgumentError Traceback (most recent call last)

in <cell line: 4>() 2 bp_decoder = LDPCBPDecoder(pcm, num_iter=20, hard_out=False) 3 e2e_bp = E2EModel(encoder, bp_decoder, k, n) ----> 4 ber_plot.simulate(e2e_bp, 5 ebno_dbs=ebno_dbs, 6 batch_size=params["mc_batch_size"],

3 frames

/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 51 try: 52 ctx.ensure_initialized() ---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 54 inputs, attrs, num_outputs) 55 except core._NotOkStatusException as e:

InvalidArgumentError: Exception encountered when calling layer 'e2e_model_2' (type E2EModel).

Detected unsupported operations when trying to compile graph ldpcbp_decoder_1_while_body_14595_const_0[] on XLA_CPU_JIT: RaggedRange (No registered 'RaggedRange' OpKernel for XLA_CPU_JIT devices compatible with node {{node ldpcbp_decoder_1/while/RaggedWhere/RaggedRange}}){{node ldpcbp_decoder_1/while/RaggedWhere/RaggedRange}}

[[ldpcbp_decoder_1/while]] tf2xla conversion failed while converting __inference_call_16140[_XlaMustCompile=true,config_proto=3175580994766145631,executor_type=11160318154034397263]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions. [Op:__inference_call_16140]

Call arguments received by layer 'e2e_model_2' (type E2EModel): • batch_size=tf.Tensor(shape=(), dtype=int32) • ebno_db=tf.Tensor(shape=(), dtype=float32)

alferio1349 avatar Dec 02 '24 15:12 alferio1349

Hi, it seems like you are running the code in XLA mode (i.e., you use the @tf.function(jit_compile=True) decorator when calling the e2e model). Depending on the version of TF and Sionna this may not be supported. Can you please run the code with jit_compile=False? Alternatively, activating the GPU runtime in Colab might also help.

SebastianCa avatar Dec 03 '24 09:12 SebastianCa

Hi, I have run the code with @tf.function(jit_compile=False and activated the GPU runtime in colab but STILL experiencing the same error. I even downgrade TF to 2.15 but still experiencing the error.

Thank you.

alferio1349 avatar Dec 03 '24 17:12 alferio1349

The error is clearly related to XLA. The E2E model in the gnn.py uses XLA (see gnn.py L684), can you please try to set jit_compile=False?

If this does not help, can you please provide a minimum working example (the code snippet above does not allow to reproduce the error).

SebastianCa avatar Dec 03 '24 21:12 SebastianCa