CrowdDetection icon indicating copy to clipboard operation
CrowdDetection copied to clipboard

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model

Open stoneMo opened this issue 4 years ago • 1 comments

When I run the cascade_emd model, I met the error as the following. I appreciate it if you could help me out. Thank you in advance.

Traceback (most recent call last): File "train.py", line 167, in run_train() File "train.py", line 164, in run_train train(args) File "train.py", line 156, in train worker(0, 1, args) File "train.py", line 119, in worker train_one_epoch(model, train_loader, opt, max_steps, rank, epoch_id, gpu_num) File "train.py", line 58, in train_one_epoch losses = propagate() File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/jit/init.py", line 424, in call self._compiled_func() File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1208, in call self._execute() File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1092, in _execute return _mgb.AsyncExec__execute(self) megengine._internal.exc.MegBrainError: MegBrain core throws exception: mgb::AssertionError assertion `begin >= 0 && end >= begin && end <= size_ax' failed at /home/code/src/core/impl/tensor.cpp:151: mgb::SubTensorSpec mgb::Slice::apply(megdnn::TensorLayout, int) const extra message: index out of bound: layout={511(1),1(1)}; request begin=None end=2 step=None axis=1

  • bt:/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/_mgb.cpython-36m-x86_64-linux-gnu.so{1e36052,1edec06,1fc6782,1fc6fd0} | Associated operator: id=160315 name=subtensor(argsort[160305]:o0)[160315] type=mgb::opr::Subtensor | input variables: | 0: {id:160306, shape:{511,1}, Float32, owner:argsort(MUL[160303])[160305]{ArgsortForward}, name:argsort(MUL[160303])[160305]:o0, slot:0, gpu0:0, d, 8, 1} | 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2} | output variables: | 0: {id:160316, shape:{553,2}, Float32, owner:subtensor(argsort[160305]:o0)[160315]{Subtensor}, name:subtensor(argsort[160305]:o0)[160315], slot:0, gpu0:0, d, 8, 8} | | Unoptimized equivalent of associated operator: id=10623 name=subtensor(argsort[10615]:o0)[10623] type=mgb::opr::Subtensor | input variables: | 0: {id:10616, shape:{}, Float32, owner:argsort(MUL[10611])[10615]{ArgsortForward}, name:argsort(MUL[10611])[10615]:o0, slot:0, gpu0:0, d, 8, 1} | 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2} | output variables: | 0: {id:10624, shape:{}, Float32, owner:subtensor(argsort[10615]:o0)[10623]{Subtensor}, name:subtensor(argsort[10615]:o0)[10623], slot:0, gpu0:0, d, 8, 8}

stoneMo avatar Jul 17 '20 09:07 stoneMo

Does this error exist in fpn_baseline or emd_simple?

xg-chu avatar Aug 03 '20 08:08 xg-chu