Du Li

Results 5 issues of Du Li

Hi, I'm doing model inference for multiple nodes. It works fine with two nodes, but it always throws the following error when it runs on more than two nodes: NCCL...

Hello, I ran Bert example on MI-250x by using command: python3 examples/03_bert/benchmark_ait.py --batch-size 32 --seq-length 512 --encoders-only false However, it aborted with the following errors: ./tmp/BERT_fast_gelu_32_512/batch_gather_1.cpp:27: int64_t (anonymous namespace)::GetInOffset(const int64_t,...

Hello, When I run the examples in the ROCm docker, I always get the following error: Traceback (most recent call last): File "examples/07_how_to_run_pt_model/how_to_run_pt_model.py", line 131, in verify_simple_model() File "examples/07_how_to_run_pt_model/how_to_run_pt_model.py", line...

Hello, I installed AIT on AMD MI-250 with ROCm 5.4 from source. when I run: ./tests/unittest/ops# python test_groupnorm.py I got the following errors: ERROR: test_groupnorm_float16 (__main__.GroupnormTestCase) ---------------------------------------------------------------------- Traceback (most recent...

This PR is a prototype of adding API for capabilities in accelerators including: 1. define capabilities in abstract_accelerator 2. set capabilities in cuda_accelerator Welcome hardware vendors to define capabilities for...