gemmini icon indicating copy to clipboard operation
gemmini copied to clipboard

How to run BERT model with spike?

Open hyuns-lee opened this issue 4 years ago • 1 comments

I'm trying to run BERT example with spike. This is what I'm doing.

  1. configure gemmini configs: /root/chipyard/generators/gemmini/configs/GemminiCustomConfigs.scala
  2. build-spike: /root/chipyard/generators/gemmini/scripts/build-spike.sh
  3. run spike at /root/chipyard/generators/gemmini/software/onnxruntime-riscv/systolic_runner/bert_mask_runner: spike --extension=gemmini pk ort_test -m onnx/bert-base-cased.onnx -x 2 -O 99

and then it shows

Gemmini extension configured with: dim = 128 bbl loader Loaded runner program Using systolic in mode 2 Using Onnxruntime C++ API Number of inputs = 3 Input 0 : name=input_ids, type=7, num_dims=2: [-1, -1, ] Input 1 : name=attention_mask, type=7, num_dims=2: [-1, -1, ] Input 2 : name=token_type_ids, type=7, num_dims=2: [-1, -1, ] Number of outputs = 1 Output 0 : name=output_0, type=1, num_dims=3: [-1, -1, 28996, ] BERT batch & seq dims: 1, 9 The first input id vals are: 101, 1109, 2764 The first attention mask val are: 1, 1, 1 The first token type ids are: 0, 0, 0 Starting inference Called into systolic matmul! Using accelerated matmul with dimensions (9, 768, 768) LOOP_WS bounds were too large for double-buffering

when I change the accelerator configurations in GemminiCustomConfigs.scala it seems to be applied to spike. And I may change some configurations.. but I don't know what to change... please help me.

hyuns-lee avatar Nov 15 '21 17:11 hyuns-lee

I think I know what this issue might be.

Whenever you build Spike using the ./build-spike.sh script, you generate a new file called gemmini_params.h, which you can find in the following location:

chipyard/generators/gemmini/software/gemmini-rocc-tests/include/

ONNX-Runtime actually has its own copy of this file, called systolic_params_int8.h, found here.

Once you changed the meshRows, this file was updated for Spike, but not for ONNX-Runtime.

Can you try replacing systolic_params_int8.h with your gemmini_params.h, and rebuilding ONNX-Runtime, and seeing if it works then?

In the future, we will update our scripts so that ONNX-Runtime always uses the same gemmini_params.h as Spike does.

hngenc avatar Nov 25 '21 22:11 hngenc