ucx icon indicating copy to clipboard operation
ucx copied to clipboard

AZP: Add L40G test

Open Alexey-Rivkin opened this issue 2 years ago • 16 comments

Alexey-Rivkin avatar Nov 14 '23 12:11 Alexey-Rivkin

It's "L40G" rather than "LG40", pls rename everything accordingly..

yosefe avatar Nov 14 '23 14:11 yosefe

It's "L40G" rather than "LG40", pls rename everything accordingly..

Done

Alexey-Rivkin avatar Nov 16 '23 10:11 Alexey-Rivkin

now there are a couple of failures on rain, let's fix them in this PR. @Alexey-Rivkin, wdyt? one of the issues is addressed in #9495 and I'm checking the other one

brminich avatar Nov 16 '23 17:11 brminich

  1. Tests fail due to the UCX code not supporting L40G.
  2. Is there a link between https://github.com/openucx/ucx/pull/9495 and this PR?

Alexey-Rivkin avatar Nov 16 '23 19:11 Alexey-Rivkin

  1. Tests fail due to the UCX code not supporting L40G.
  2. Is there a link between TEST/RKEY: Check rkey distance with accordance to fp8 precision #9495 and this PR?
  1. Seems that there are currently 2 issues in UCX
  2. One of them is supposed to be addressed by #9495

What I propose is to commit fix for another issue to this PR

brminich avatar Nov 16 '23 19:11 brminich

What is L40G ? Is this somehow related to L40 GPU ?

shamisp avatar Nov 20 '23 15:11 shamisp

What is L40G ? Is this somehow related to L40 GPU ?

You got it right.

Alexey-Rivkin avatar Nov 20 '23 15:11 Alexey-Rivkin

/azp run

Alexey-Rivkin avatar Nov 29 '23 15:11 Alexey-Rivkin

Azure Pipelines successfully started running 3 pipeline(s), but failed to run 1 pipeline(s).

azure-pipelines[bot] avatar Nov 29 '23 15:11 azure-pipelines[bot]

the L40G test failures are things need to fix in ucx code/gtest

yosefe avatar Nov 30 '23 08:11 yosefe

the L40G test failures are things need to fix in ucx code/gtest

Do we have a bug or other reference to track this issue?

Alexey-Rivkin avatar Nov 30 '23 12:11 Alexey-Rivkin

/azp run

Alexey-Rivkin avatar Dec 04 '23 09:12 Alexey-Rivkin

Azure Pipelines successfully started running 3 pipeline(s), but failed to run 1 pipeline(s).

azure-pipelines[bot] avatar Dec 04 '23 09:12 azure-pipelines[bot]

For the reference https://github.com/openucx/ucx/issues/9531

artemry-nv avatar Dec 13 '23 04:12 artemry-nv

/azp run

rakhmets avatar Dec 13 '23 08:12 rakhmets

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines[bot] avatar Dec 13 '23 08:12 azure-pipelines[bot]