Screencasts with multiple workers eventually fail
I noticed that screencasts eventually fail when running with this configuration for archiving a set of page URLs with four workers with screencasting turned on for port 9037:
docker run -p 9037:9037 -it --rm -v $PWD:/crawls/ webrecorder/browsertrix-crawler:latest crawl --config /crawls/crawl.yaml
Things start out well, with four active screencasts. But eventually the screencasts start to disappear from the page, and eventually only one is left which seems stuck. I can see from the console that the four workers are still actively crawling. When I reload the screencast page http://localhost:9037 I can see in the terminal that the workers appear to be stopping/starting the screencasts, but the page doesn't reflect that anything has changed.
{"logLevel":"info","timestamp":"2023-08-11T13:46:43.985Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":3}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:43.985Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":0}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:43.986Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":2}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:44.078Z","context":"screencast","message":"Started Screencast","details":{"workerid":1}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:44.078Z","context":"screencast","message":"Started Screencast","details":{"workerid":3}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:44.079Z","context":"screencast","message":"Started Screencast","details":{"workerid":0}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:44.079Z","context":"screencast","message":"Started Screencast","details":{"workerid":2}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:58.940Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":1}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:58.941Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":3}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:58.941Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":0}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:58.941Z","context":"screencast","message":"Stopping Screencast","details":{"workerid":2}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:59.020Z","context":"screencast","message":"Started Screencast","details":{"workerid":1}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:59.021Z","context":"screencast","message":"Started Screencast","details":{"workerid":3}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:59.021Z","context":"screencast","message":"Started Screencast","details":{"workerid":0}}
{"logLevel":"info","timestamp":"2023-08-11T13:46:59.022Z","context":"screencast","message":"Started Screencast","details":{"workerid":2}
@edsu to confirm, you're still seeing other messages that the crawler is running, just not any screencasts here? eg. the console indicates that crawl is progressing? I wonder if its a memory issue -- some of those pages seem to be fairly CPU/memory intensive. By default here (unlike in Browsertrix Cloud), there's no memory constraints. Have you tried running with less workers?
Fixed a few issues in 0.11.1 that could have caused this, including closing the screencast never returning, page crashes, and browser crashes. Hopefully won't be getting stuck, if you have a chance to retry, and it happens again, let us know here.
The screencasts are much more reliable now, thanks!
I spoke too soon. After a few hours they all disappeared :-( i can share the log if it's helpful?
I spoke too soon. After a few hours they all disappeared :-( i can share the log if it's helpful?
This is with 0.11.1? Yes, that would be helpful! Assume reloading the page didn't help, right?
Yes, I did a a docker pull browsertrix-crawler:latest today. Here's the log!
crawl-20230919145843766.log.gz
You can see near the end of the log I tried to reload the page which seemed to trigger some messages like:
...
{"timestamp":"2023-09-19T18:31:57.155Z","logLevel":"info","context":"screencast","message":"Stopping Screencast","details":{"workerid":2}}
{"timestamp":"2023-09-19T18:31:57.155Z","logLevel":"info","context":"screencast","message":"Stopping Screencast","details":{"workerid":4}}
{"timestamp":"2023-09-19T18:31:57.155Z","logLevel":"info","context":"screencast","message":"Stopping Screencast","details":{"workerid":1}}
{"timestamp":"2023-09-19T18:31:57.155Z","logLevel":"info","context":"screencast","message":"Stopping Screencast","details":{"workerid":5}}
{"timestamp":"2023-09-19T18:31:57.155Z","logLevel":"info","context":"screencast","message":"Stopping Screencast","details":{"workerid":0}}
{"timestamp":"2023-09-19T18:31:57.233Z","logLevel":"info","context":"screencast","message":"Started Screencast","details":{"workerid":3}}
{"timestamp":"2023-09-19T18:31:57.233Z","logLevel":"info","context":"screencast","message":"Started Screencast","details":{"workerid":2}}
{"timestamp":"2023-09-19T18:31:57.233Z","logLevel":"info","context":"screencast","message":"Started Screencast","details":{"workerid":4}}
{"timestamp":"2023-09-19T18:31:57.233Z","logLevel":"info","context":"screencast","message":"Started Screencast","details":{"workerid":1}}
{"timestamp":"2023-09-19T18:31:57.234Z","logLevel":"info","context":"screencast","message":"Started Screencast","details":{"workerid":5}}
{"timestamp":"2023-09-19T18:31:57.234Z","logLevel":"info","context":"screencast","message":"Started Screencast","details":{"workerid":0}}
...
It ran for at least an hour without a problem, which was an improvement on the prior behavior. I noticed that the CPU usage tapered off in htop, but I'm not sure what the cause of that was.
This should be fixed in 1.x release, have not seen this issue for a while.