Du Li
Du Li
Hi, I'm doing model inference for multiple nodes. It works fine with two nodes, but it always throws the following error when it runs on more than two nodes: NCCL...
Hello, I ran Bert example on MI-250x by using command: python3 examples/03_bert/benchmark_ait.py --batch-size 32 --seq-length 512 --encoders-only false However, it aborted with the following errors: ./tmp/BERT_fast_gelu_32_512/batch_gather_1.cpp:27: int64_t (anonymous namespace)::GetInOffset(const int64_t,...
Hello, When I run the examples in the ROCm docker, I always get the following error: Traceback (most recent call last): File "examples/07_how_to_run_pt_model/how_to_run_pt_model.py", line 131, in verify_simple_model() File "examples/07_how_to_run_pt_model/how_to_run_pt_model.py", line...
Hello, I installed AIT on AMD MI-250 with ROCm 5.4 from source. when I run: ./tests/unittest/ops# python test_groupnorm.py I got the following errors: ERROR: test_groupnorm_float16 (__main__.GroupnormTestCase) ---------------------------------------------------------------------- Traceback (most recent...
This PR is a prototype of adding API for capabilities in accelerators including: 1. define capabilities in abstract_accelerator 2. set capabilities in cuda_accelerator Welcome hardware vendors to define capabilities for...