orcanode icon indicating copy to clipboard operation
orcanode copied to clipboard

s3_upload branch (bug?): Occasionally Unix new datetime fails to upload

Open scottveirs opened this issue 5 years ago • 7 comments

This has only happened ~twice during experimental deployment of boto-based Python s3_upload branch, but player fails to play (no response on pressing play button at live.orcasound.net/ ) because Unix datetime stamped directory is created on Raspberry Pi4 but not transferred to S3 streaming bucket. Thus player finds the latest.txt file, but throws and error when trying to access the latest .m3u8 file.

Possible bug may be in the inotify logic and/or rare issues with boto failing?

Screen Shot 2020-10-08 at 9 15 45 AM Screen Shot 2020-10-08 at 9 18 39 AM Screen Shot 2020-10-08 at 9 26 17 AM Screen Shot 2020-10-08 at 9 32 01 AM Screen Shot 2020-10-08 at 9 35 56 AM

scottveirs avatar Oct 08 '20 17:10 scottveirs

Happened again tonight (Fri Jan 29, 2021 upon the 18:30 local container restart; Orcasound Lab Rpi4). Logged in via Dataplicity and confirmed with htop that stream.sh, ffmpeg, and s3_upload.py all running (but low CPU usage?). The datetime directory exists in /tmp but is empty.

Screen Shot 2021-01-29 at 11 09 10 PM

Checked S3 and don't see new datetime "folder" object in streaming bucket. This suggests inotify didn't catch the directory creation event on Rpi. Why would that be?

Screen Shot 2021-01-29 at 11 11 01 PM

I suspected the hw:pisound might be mapped to wrong device, but I think 1,0 is expected... If ffmpeg can't find the audio device, perhaps it never creates an m3u8 or ts segments in the tmp/.../datetime directory?

scottveirs avatar Jan 30 '21 07:01 scottveirs

Noting another occurrence today (Jan 10, 2022 for bucket object 1641825019) on the orcasound-lab node with the same behavior as on 1/29/2021 above.

Screen Shot 2022-01-10 at 11 02 27 AM

scottveirs avatar Jan 10 '22 19:01 scottveirs

Val and I just observed another instance of this bug for the orcasound-lab node. The missing data directory in this case was 1703601018

scottveirs avatar Dec 26 '23 19:12 scottveirs

And a (related, I think?) event for orcasound-lab node between the last two of these entries:

Screenshot 2024-05-05 at 2 19 35 PM

The missing datetime object means there was a a loss of 6 hours of data last night (5/4/24 18:30 - 5/5/24 00:30)

scottveirs avatar May 05 '24 21:05 scottveirs

If you look in base/upload_s3.py there is a call that uploads the file. We don't actually get the return value from that call.

resource.meta.client.upload_file(uploadfile, BUCKET, uploadkey, ExtraArgs={'ACL': 'public-read'})

I just checked the API documentation and there is a return value True for success and False for failure to upload.

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html

I'm not sure what the state of the logging, is there was some problem w/ the logging tool and raspbian. But assuming that the logging was working it would seem adding a check to this call would seem to me to be the next step.

Also even though the timestamp name looks like a folder, it's not really. There is no "create directory" function. So if we are missing a timestamp folder completely that's more like there was some error when stream.sh was called, but the container is supposed to restart if there is an error. So it's a bit of a puzzle.

mcshicks avatar May 05 '24 21:05 mcshicks

May have happened again at Orcasound Lab in last couple days since Val and the A-team got new hydrophones deployed after the November windstorm damage.

The .env file was showing audio device = 1,0 rather than pisound so I changed that. Then it seemed to take a couple container re-starts and one dis/re-connect via Dataplicity (from Seattle) to get both ffmpeg and the upload script running as normally.

For the record -- Here are Mezmo logs for a failed attempt (which uploaded latest.txt file, but not the corresponding date-time object holding the first version of the .m3u8 file) --

Screenshot 2024-12-23 at 12 25 40 AM

and the final attempt (which worked) -- Screenshot 2024-12-23 at 12 24 10 AM

Note in the failed attempt there are no log entries after this upload script line --

Dec 21 13:36:09 rpi_orcasound_lab orcanode_streaming_1 info upload_s3.s3_copy_file: upload key: rpi_orcasound_lab/latest.txt

whereas in the successful one there is output from ffmpeg that indicate ADC progress --

Dec 23 00:08:33 rpi_orcasound_lab orcanode_streaming_1 err size=N/A time=00:00:00.51 bitrate=N/A speed=   1x    

scottveirs avatar Dec 23 '24 08:12 scottveirs

Interesting logs! The line that sticks out to me is this one:

subgraph starting at ffjack timed out (subgraph_wait_fd=11, status = 0, state = Triggered, pollret = 0 revents = 0x0)

Seems like that might have to do with not finding the pisound board properly. Not sure why it happened at such a random time though…

Also just to note, Orcasound Lab is still running some older code (rpi-prod-deprecated branch)

paulcretu avatar Dec 24 '24 23:12 paulcretu