Job Fails Even with Small Batches
I've been running an algorithm for the last couple weeks and keep getting a 'RemoteDisconnected' error. The linked issue claims that the main cause of the issue is the large batch size. Through the logger I've confirmed that my posts are only 1 at a time, and the total size is less than 105KB (~1KiB). After reading through the other issues linked I'm unsure if they would help either, nor exactly how to implement them.
Is the connection error simply something that must be solved by fast, stable internet?
Original Issue:
Verified fix:
- [ ] decrease default batch size to 10 (~5 MiB for 10 full Advantage-size problems)
- [ ] ideally, limit batch size by payload bytes (default to 5 MiB)
- [ ] decouple connect from read timeout (#440), and increase the default read timeout to 600 s
- [x] implement retry strategy from #414
Originally posted by @randomir in https://github.com/dwavesystems/dwave-cloud-client/issues/439#issuecomment-720752001
Extending read timeout after we implement #440 would be worth trying in cases of low bandwidth and/or high latency. We'll have that available soon for you to try (if you don't mind installing from source on master).