Du Li issues

Results 5 issues of


                                            Du Li

Cannot do inference for any model on more than two nodes

Hi, I'm doing model inference for multiple nodes. It works fine with two nodes, but it always throws the following error when it runs on more than two nodes: NCCL...

Error to run Bert example on AMD

Hello, I ran Bert example on MI-250x by using command: python3 examples/03_bert/benchmark_ait.py --batch-size 32 --seq-length 512 --encoders-only false However, it aborted with the following errors: ./tmp/BERT_fast_gelu_32_512/batch_gather_1.cpp:27: int64_t (anonymous namespace)::GetInOffset(const int64_t,...

Error to run examples on ROCm

Hello, When I run the examples in the ROCm docker, I always get the following error: Traceback (most recent call last): File "examples/07_how_to_run_pt_model/how_to_run_pt_model.py", line 131, in verify_simple_model() File "examples/07_how_to_run_pt_model/how_to_run_pt_model.py", line...

Ops unit tests fail on ROCm

Hello, I installed AIT on AMD MI-250 with ROCm 5.4 from source. when I run: ./tests/unittest/ops# python test_groupnorm.py I got the following errors: ERROR: test_groupnorm_float16 (__main__.GroupnormTestCase) ---------------------------------------------------------------------- Traceback (most recent...

Adding DS Feature API in accelerator

This PR is a prototype of adding API for capabilities in accelerators including: 1. define capabilities in abstract_accelerator 2. set capabilities in cuda_accelerator Welcome hardware vendors to define capabilities for...