REnforce
REnforce copied to clipboard
Example fails when run with --release
The cartpole example works fine when I run the debug build, but if I run with --release
, there seems to be a communication problem with the gym server:
$ cargo run cartpole --release
Compiling renforce v0.1.0 (file:///tmp/REnforce)
Finished release [optimized] target(s) in 3.54 secs
Running `target/release/cartpole cartpole`
Training...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Error { repr: Os { code: 99, message: "Cannot assign requested address"
} })', /checkout/src/libcore/result.rs:860
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::_print
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:371
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:549
5: std::panicking::begin_panic
at /checkout/src/libstd/panicking.rs:511
6: std::panicking::begin_panic_fmt
at /checkout/src/libstd/panicking.rs:495
7: rust_begin_unwind
at /checkout/src/libstd/panicking.rs:471
8: core::panicking::panic_fmt
at /checkout/src/libcore/panicking.rs:69
9: core::result::unwrap_failed
10: <cartpole::CartPole as renforce::environment::Environment>::step
11: <core::iter::Map<I, F> as core::iter::iterator::Iterator>::next
12: cartpole::main
13: __rust_maybe_catch_panic
at /checkout/src/libpanic_unwind/lib.rs:98
14: std::rt::lang_start
at /checkout/src/libstd/panicking.rs:433
at /checkout/src/libstd/panic.rs:361
at /checkout/src/libstd/rt.rs:59
15: __libc_start_main
16: _start
I looked into the issue. I believe to cause is the code running too quickly in release, so the server gets sent too many requests, and can't keep up (The rust bindings for the gym server are not the best), but I could be wrong. I tried seeing if adding something like
while let Err(..) = obs { /* blah */ }
and making multiple attempts until one worked would fix the issue, but that solved nothing. For the moment, the best I can think of is to just only run the gym examples in debug.
That makes sense. The http client is creating a new connection for every request, which is certainly suboptimal. Is there an easy way to make hyper reuse connections? It seems Keep-Alive is enabled by default: https://docs.rs/hyper/0.11.1/hyper/client/struct.Config.html#method.keep_alive
I don't know Hyper too well, but that's on the latest version. The bindings for the server use an earlier version where is seems you have to set this manually. I tried editing a local copy of the bindings code to include a call to headers.set(Connection::keep_alive());
, but that didn't seem to fix things either.
I took another look at this issue to see if I good figure anything out, but no luck. I'm not sure there's a way to avoid this error without editing the server code (I don't know a lot about flask or servers in general so I'm not 100% sure what the options are here), but we might not have to avoid the error.
The error causing this to abort doesn't crash the server, so we should be able to just wait a little bit after receiving it and then continue with buisness as usual. It seems to me that a good long term solution (which would take a while to implement) is to just add better error checking into the library. Define some REnforceError
enum, return Result
s everywhere, and have a variant of the enum specific to this error so the client knows it's not urgent/fatal but to wait a bit.