AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Dynamic Batch Model Testing and Debugging

Open CharlieL7 opened this issue 1 year ago • 5 comments

Support for the models and current status can be seen at (AMD internal only): https://amdcloud-my.sharepoint.com/:x:/g/personal/charllin_amd_com/Ebf6_4jYgANDnx8_tUr3Y_4Bs4ULKjYLsNmQZWDiQOrO4w?e=v6OEYY&nav=MTVfezAwMDAwMDAwLTAwMDEtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMH0

CharlieL7 avatar Apr 19 '23 20:04 CharlieL7

Try the given "driver" commands with each model and see if they can compile without failing. This feature is a work in progress so we can expect to find fails due either to bugs or to incomplete implementation of dynamic batch sizing for specific ops.

Log the output from each driver run, with error messages.

Debug and fix the errors we find.

Be prepared to support QA by providing them with archive locations of the sample model files, as well as command-line arguments to run each one when they construct tests.

bpickrel avatar Apr 19 '23 20:04 bpickrel

Need to add dynamic shape support to deconvolution op. for model 3dunet_kits19_128x128x128.onnx

bpickrel avatar Apr 25 '23 19:04 bpickrel

The following model files can be found, for now, at /home/bpickrel/AMDMIGraphX/models/ on rocm-rome-6. I tested them with the given command lines:

  • 3dunet_kits19_128x128x128.onnx MIGRAPHX_TRACE_EVAL=yes bin/driver verify ../models/3dunet/model/3dunet_kits19_128x128x128.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim "@input" "[{min:1, max:4}, 1, 128, 128, 128]" >& ../models/logs/3dunet_kits19_128x128x128.errlog (failed, see comment above)
  • bert_base_cased_1_fp16_gpu.onnx bin/driver verify ../models/bert_base_cased_1_fp16_gpu.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim "@input_ids" "[{min:1, max:4}, 3]" (failed)
  • resnet50-v1-7.onnx bin/driver verify ../models/resnet50_v1.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]" (success)
  • distilgpt2_1_fp16_gpu.onnx MIGRAPHX_TRACE_EVAL=1 bin/driver compile ../models/distilgpt2_1.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @input_ids "[{min:1, max:4}, 128]" >& ../models/logs/distilgpt2_1.errlog (failed)
  • distilgpt2_1.onnx MIGRAPHX_TRACE_EVAL=1 bin/driver verify ../models/distilgpt2_1.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim "@input_ids" "[{min:1, max:4}, 128]" >& ../models/logs/distilgpt2_1.errlog (failed)
  • yolov4.onnx MIGRAPHX_TRACE_COMPILE=1 bin/driver compile ../models/yolov4.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim "@input_1:0" "[{min:1, max:4}, 416, 416, 3]" >& ../models/logs/yolov4.errlog (failed)
  • inception_v2/model.onnx bin/driver verify ../models/inception_v2/model.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim "@data_0" "[{min:1, max:4}, 3, 224, 224]" (success; have not located inception_v3 model yet)

bpickrel avatar Apr 28 '23 22:04 bpickrel

Todo: Need to make auto padding work with pooling for onnx taau-downsample model taau_low_res_downsample_d2s_for_infer_time_fp16_opset11.onnx. The amount of padding must be determined at runtime for a dynamic shape.

bpickrel avatar May 08 '23 19:05 bpickrel

Use the Excel sheet at the above link for the correct command lines; these are now obsolete

bpickrel avatar Nov 10 '23 20:11 bpickrel