go-livepeer
go-livepeer copied to clipboard
Segments fail to upload on commit 73aaca8c3da1f5269fe059dd03b0cbf4d6796a1c
All Orchestrator nodes were operating fine on 0.5.34, except for Boston, which was running on a special version version of 0.5.34
Almost all segments fail to upload, causing test stream scores to plummet. Weirdly enough, it did not seem to affect my actual transcoding work. One other Orchestrator also confirmed this issue while he was on above version and also rolled back to 0.5.34 to fix the issue
I don't much info to give you, as the code does not print the actual error message when it fails...
E0907 04:44:32.830831 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3934 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:31.397356 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1327 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:30.675425 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3933 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:29.525661 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1326 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:28.652747 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3932 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:27.490905 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1325 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:26.739795 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3931 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:25.228388 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1324 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:24.520305 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3930 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:23.276689 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1323 orchSessionID=8210a3e9 clientIP=212.102.58.242 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:22.670958 2603746 segment_rpc.go:199] manifestID=1a8d689d-9357-4d12-9aa8-0286f8c42876 seqNo=3929 orchSessionID=5084a2e3 clientIP=195.181.169.69 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment
E0907 04:44:21.369977 2603746 segment_rpc.go:199] manifestID=12fdaa0c-ea69-44cf-8fe3-dbbe427a66d2 seqNo=1322 orchSessionID=8210a3e9 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=212.102.58.242 Could not upload segment
It looks like the branch has been removed, but it contained all of the commits which are now in the master branch which got added after v0.5.34. It seems that one of the commits added after this release was causing this issue
i deployed 2 days ago the linux 0.5.34-3ff79bd7 version because this one fix the estimation gas error. Since i ve deployed it, i noticed some difficulty to keep streams, and also some very bad streams tests. So i rolled back our nodes with 0.5.34, and all is fine after that. So i presume there is something wrong in one of the merge between 0.5.34 and 0.5.34-3ff79bd7. Can' t be sure of the link with this version and can't share log about that. but i saw same error than stronk with the precedent fix, and have this kind of incident with 3ff79bd7.
@oscar-davids @cyberj0g The only commits that came in since 0.5.34 are #2381 and #2568 - I know there's not a lot to go on here, but would appreciate any thoughts on what might be that cause / what info we could ask for to help us debug
The issue of difficulty to keep streams has been fixed by #2586. Checked manually in our https://livepeer.studio/dashboard.
The uploading segment failure issue has been fixed by #2591. Checked in grafana dashboard. here is comparison link. 09-17 vs 09-19
Nice! I'll run the latest master build on my orch nodes and will report if anything breaks
Immediately got 30 Could not upload segment
errors on the latest commit in master, so there is still an issue somewhere
@stronk-dev thank you for testing immediately.
I added error logs when segment uploading failed in oc/adduploadfaillog
branch.
Could you test it again with new branch in your side? I would like to see the exact reason why the upload failed.
Will take a while since i can't build go-livepeer from source on Arch, will probably give it a go later today
Just switched all of my nodes to commit 6c49c04c732cea919a5ccbf3303ce812fa05de92 (retrieved the binary from discord builds channel). Results:
- Seeing a whole bunch of
EndTranscodingSession called
, even if it is not transcoding at that moment. Probably unrelated to this issue, but maybe we can turn the verbosity down on this? - Could not upload segment errors seem to be much less frequent! Got only a single one in the past few minutes as:
But I'll keep an eye out and update here if anything changesE0928 12:44:26.819619 1040347 segment_rpc.go:199] manifestID=4f51i9fb8iytrkqs seqNo=3 orchSessionID=825416ab clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
Past hour 10
unable to upload segment
errors, this might be as expected but we'll get questions about it in the orchestrator-support channel for sure:
| | E0928 14:06:51.202184 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=16 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 14:06:50.658063 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=14 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 14:06:50.087492 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=12 orchSessionID=a2290bf9 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.188.237 Could not upload segment err="Session ended"
| | E0928 14:06:49.187477 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=10 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 14:06:48.354880 1040347 segment_rpc.go:199] manifestID=4ac57w2hkypbukfj seqNo=8 orchSessionID=a2290bf9 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:59:37.219060 1040347 segment_rpc.go:199] manifestID=894flijoblp8zoxj seqNo=4 orchSessionID=86effee4 clientIP=84.17.50.98 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:49:54.948044 1040347 segment_rpc.go:199] manifestID=b033vps9nhhj1l1e seqNo=135 orchSessionID=0250fc20 clientIP=84.17.50.99 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:19:09.182213 1040347 segment_rpc.go:199] manifestID=6851wpfm71k2cuii seqNo=37 orchSessionID=7648e99a clientIP=195.181.174.39 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:19:07.552985 1040347 segment_rpc.go:199] manifestID=6851wpfm71k2cuii seqNo=31 orchSessionID=7648e99a sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=195.181.174.39 Could not upload segment err="Session ended"
| | E0928 13:05:23.174115 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=107 orchSessionID=be5f7e80 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E clientIP=89.187.188.237 Could not upload segment err="Session ended"
| | E0928 13:05:20.916489 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=102 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:05:20.203736 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=100 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:05:19.545365 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=98 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 13:05:18.774337 1040347 segment_rpc.go:199] manifestID=e111hrtvdqgrvicj seqNo=96 orchSessionID=be5f7e80 clientIP=89.187.188.237 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
| | E0928 12:44:26.819619 1040347 segment_rpc.go:199] manifestID=4f51i9fb8iytrkqs seqNo=3 orchSessionID=825416ab clientIP=195.181.174.186 sender=0xc3c7c4C8f7061B7d6A72766Eee5359fE4F36e61E Could not upload segment err="Session ended"
Now that it has been running for a while:
Looks like the error does happen a bit excessively in US-East, with 39 occurences of could not upload segment
per hour. Every single one of them ended with reason session ended
Looking at my transcode history, it does look like that specific node has way more trouble getting streams to stick:
@stronk-dev can you give me dashboard link?
Yea, my dashboard is publicly available at: https://grafana.stronk.tech/d/71b6OZ0Gz/orchestrator-overview
It has all info and error logs pulled from Loki and a counter for specific errors which you can unfold: