opencost Fix gpuCost computation error

What does this PR change?

Solve gpuCost computation error

Does this PR address any GitHub or Zendesk issues?

Closes https://github.com/opencost/opencost/issues/2657

Does this PR require changes to documentation?

No

Mar 22 '24 17:03 ElieLiabeuf

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
opencost	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 25, 2024 10:11am

Mar 22 '24 17:03 vercel[bot]

@ElieLiabeuf thanks for the contribution. Could you sign your commit and fix the DCO warning? This looks relatively straightforward

Mar 22 '24 19:03 mattray

@ElieLiabeuf thanks for the contribution. Could you sign your commit and fix the DCO warning? This looks relatively straightforward

It's fixed, thanks.

Mar 25 '24 10:03 ElieLiabeuf

@AjayTripathy you asked him to contribute this patch in https://github.com/opencost/opencost/issues/2657, could you review?

Apr 10 '24 07:04 mattray

@ElieLiabeuf this generally LGTM but I'm struggling to test this on my end without shared GPUs. Could you share how you tested and verified this fix?

In particular, could you share the results of /costDataModel after the fix along with node labels? I think that should be sufficient to get this merged.

Apr 11 '24 22:04 AjayTripathy

This pull request has been marked as stale because it has been open for 90 days with no activity. Please remove the stale label or comment or this pull request will be closed in 5 days.

Jul 11 '24 01:07 github-actions[bot]

This pull request was closed because it has been inactive for 95 days with no activity.

Jul 16 '24 01:07 github-actions[bot]

@thomasvn could you possibly pick up testing this as part of your GPU sprint?

Jul 22 '24 20:07 AjayTripathy

Sure! @ElieLiabeuf I'll take a look at this soon.

Jul 22 '24 22:07 thomasvn

Let's please get this in ASAP.

Jul 28 '24 11:07 chipzoller

@ElieLiabeuf Could we try to merge this change into the develop branch instead of directly into the v1.109 branch? A new PR may need to be created.

Jul 29 '24 15:07 thomasvn

@thomasvn if this is meaningful I may just ask you to take over the fix?

Jul 29 '24 18:07 AjayTripathy

@AjayTripathy Yes I can do that soon!

Jul 31 '24 16:07 thomasvn

Following up to see if we can get this in this week and start testing.

Aug 05 '24 12:08 chipzoller

After @thomasvn and I spent a few minutes looking at the code, it seems apparent that this logic will be reached if the node label nvidia.com/gpu.replicas is found in addition to there being any allocatable capacity for nvidia.com/gpu (the standard GPU resource). However, because this label is provided by GFD irrespective of whether any sharing strategy is in use (exclude one specific MIG configuration), getting this fix in is important since the calculations will be off any time GFD is deployed in the cluster. In cases where GFD and, therefore, this label are absent, the logic being patched here appears to be unaffected.

Aug 05 '24 16:08 chipzoller

In that case what do we need to merge @thomasvn ?

Aug 21 '24 16:08 AjayTripathy

We can close this PR now, as the fix will be available in https://github.com/opencost/opencost/pull/2862

Aug 23 '24 16:08 thomasvn

opencost opencost copied to clipboard

Fix gpuCost computation error

What does this PR change?

Does this PR address any GitHub or Zendesk issues?

Does this PR require changes to documentation?

opencost
opencost copied to clipboard