Remove sess from selector while removing it from the pool
In the context of Realtime Video it's a refactor. In general I think we should always remove the session from the selector when we remove it from the pool.
@ad-astra-video what do you think?
Codecov Report
:x: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 31.67732%. Comparing base (8baf330) to head (829ca0d).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| server/ai_session.go | 0.00000% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #3581 +/- ##
===================================================
+ Coverage 31.67570% 31.67732% +0.00162%
===================================================
Files 159 159
Lines 38951 38949 -2
===================================================
Hits 12338 12338
+ Misses 25721 25719 -2
Partials 892 892
| Files with missing lines | Coverage Δ | |
|---|---|---|
| server/ai_live_video.go | 0.00000% <ø> (ø) |
|
| server/ai_session.go | 7.95107% <0.00000%> (-0.02439%) |
:arrow_down: |
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 8baf330...829ca0d. Read the comment docs.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| server/ai_live_video.go | 0.00000% <ø> (ø) |
|
| server/ai_session.go | 7.95107% <0.00000%> (-0.02439%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
This just streamlines the selection logic to avoid the for loop in AISessionPool.Select right?
The next .Select on the selector removes the session on next pass but its not entirely clear that is what is happening. When a session is selected from unknownSessions it is removed from the Selector.sessions list and never returned because Complete is never called for the session. Seems kind of silly now that I am looking at it to remove it that way :).
Curious if selecting sessions is failing because of this? A max try of 20 was added recently that could be impacting this.
If we are doing this then some other updates would make sense to include in my opinion:
I think we do not need the selection loop here if the session is removed from the selector https://github.com/livepeer/go-livepeer/blob/218fd2a0b3cdae110d5ff650f0df0b0c3faef14f/server/ai_session.go#L49-L56 This is the only thing that could require a re-selection. Which I think we do not need after this update. https://github.com/livepeer/go-livepeer/blob/218fd2a0b3cdae110d5ff650f0df0b0c3faef14f/server/ai_session.go#L67-L72 We would just need these parts with no for loop wrapping it right? https://github.com/livepeer/go-livepeer/blob/218fd2a0b3cdae110d5ff650f0df0b0c3faef14f/server/ai_session.go#L55-L66 https://github.com/livepeer/go-livepeer/blob/218fd2a0b3cdae110d5ff650f0df0b0c3faef14f/server/ai_session.go#L73-L76 I think would make sense to run this for all selectors I think to enable the Orchestrator to get back in the pool before it empties or is less than half the starting size https://github.com/livepeer/go-livepeer/blob/218fd2a0b3cdae110d5ff650f0df0b0c3faef14f/server/ai_session.go#L260-L262 If have periodic refresh for all selectors, probably do not need this check anymore which can remove the floating lock here too https://github.com/livepeer/go-livepeer/blob/218fd2a0b3cdae110d5ff650f0df0b0c3faef14f/server/ai_session.go#L341-L348
This just streamlines the selection logic to avoid the for loop in
AISessionPool.Selectright?
Hmm, I think it works differently for the Realtime Video AI and for AI Jobs. For Realtime Video we do have periodic session refresh, for AI Jobs we don't.
Anyway, I think it's not related to this PR. This PR just makes sure we also remove the session from the pool while removing it from the selector. @ad-astra-video Do you think we should ever have a session in the pool while not having it in the selector?
I don't see a reason to have the session in the selector but not in the pool. It is removed from the selector on the next round of selection anyways if it is selected.
My only conern was the for loop in Select is not needed anymore with this change because the continue can never be hit since the session is removed from the pool sessMap and the selector at the same time.
for try := 1;
Ahh, I see it now. I think you're right. I removed it. Do you think it's ok now?
Holding off with merging this PR, because I still see try=2 for our Realtime Video pipeline, which should not happen according tho the discussion above.
Are you still seeing try=2? I think its a race condition because the current sel.warmPool.selector.Remove(params.liveParams.sess.BroadcastSession) is not under the pool.mu.Lock(). WDYT?
Moving the session removal to the Remove function puts it under a lock.
sel.warmPool.selector.Remove(params.l
Good point. I've fired: https://github.com/livepeer/go-livepeer/pull/3640
Then, we can check if we still see try=2 in the logs. Thanks @ad-astra-video
sel.warmPool.selector.Remove(params.l
Good point. I've fired: #3640
Then, we can check if we still see
try=2in the logs. Thanks @ad-astra-video
For some reason, I still see it happening. When I have some time, I'll dig deeper into this.
Maybe we can just keep the loop and use a lower count. 5 maybe? Will allow for more time to see it work in the wild and what causes the 2nd try on selection.
Maybe we can just keep the loop and use a lower count. 5 maybe? Will allow for more time to see it work in the wild and what causes the 2nd try on selection.
Ok, changed to 5. PTAL.
@leszko I lost track of this. Sorry about that.