Breaking failure: <class 'ValueError'> filedescriptor out of range in select()
Describe the bug I found an issue that put IBeam into failure scenario, didn't realize it kept retrying. Don't have a ton of information about it, nor can I reproduce (Maybe just time?)
The fix was just docker compose down and up but would like to understand what happened here.
2025-05-13 02:02:09,968|I| Cleaning up the resources. Display: None | Driver: None
2025-05-13 02:02:09,968|I| Logging in failed
2025-05-13 02:02:39,662|I| Maintenance
2025-05-13 02:02:54,947|I| Attempt number 2
2025-05-13 02:02:54,947|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:03:09,968|I| Max request retries reached after 2 attempts. Consider increasing the retries by setting IBEAM_REQUEST_RETRIES environment variable
2025-05-13 02:03:09,968|I| NO SESSION Status(running=True, session=False, connected=False, authenticated=False, competing=False, collision=False, session_id=None, server_name=None, server_version=None, expires=None)
2025-05-13 02:03:09,968|I| Authentication strategy: "B"
2025-05-13 02:03:09,968|I| No active sessions, logging in...
2025-05-13 02:03:09,968|I| Loading auth webpage at https://localhost:5000/sso/Login?forwardTo=22&RL=1&ip2loc=on
2025-05-13 02:03:09,968|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:03:09,991|I| Cleaning up the resources. Display: None | Driver: None
2025-05-13 02:03:09,991|I| Logging in failed
2025-05-13 02:03:09,991|E| Error encountered during authentication
Exception:
File "/usr/local/lib/python3.11/threading.py", line 995, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.11/threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
work_item.run()
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/venv/lib/python3.11/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "/srv/ibeam/src/gateway_client.py", line 115, in _maintenance
success, shutdown, status = self.start_and_authenticate(request_retries=self.request_retries)
File "/srv/ibeam/src/gateway_client.py", line 62, in start_and_authenticate
success, shutdown, status = self.strategy_handler.try_authenticating(request_retries=request_retries)
File "/srv/ibeam/src/handlers/strategy_handler.py", line 85, in try_authenticating
return self._authentication_strategy_B(status, request_retries)
File "/srv/ibeam/src/handlers/strategy_handler.py", line 140, in _authentication_strategy_B
return self._log_in(status)
File "/srv/ibeam/src/handlers/strategy_handler.py", line 151, in _log_in
success, shutdown = self.login_handler.login()
File "/srv/ibeam/src/handlers/login_handler.py", line 468, in login
driver, display = start_up_browser(self.driver_factory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/ibeam/src/login/driver.py", line 150, in start_up_browser
display.start()
File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/display.py", line 72, in start
self._obj.start()
File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/abstractdisplay.py", line 149, in start
self._start1_has_displayfd()
File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/abstractdisplay.py", line 197, in _start1_has_displayfd
self.display = int(self._wait_for_pipe_text(rfd))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/abstractdisplay.py", line 295, in _wait_for_pipe_text
(rfd_changed_ls, _, _) = select.select([rfd], [], [], 0.1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
<class 'ValueError'> filedescriptor out of range in select()
2025-05-13 02:03:39,662|I| Maintenance
2025-05-13 02:03:54,919|I| Attempt number 2
2025-05-13 02:03:54,919|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:04:09,946|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:04:09,947|I| Max request retries reached after 2 attempts. Consider increasing the retries by setting IBEAM_REQUEST_RETRIES environment variable
2025-05-13 02:04:09,947|I| NO SESSION Status(running=True, session=False, connected=False, authenticated=False, competing=False, collision=False, session_id=None, server_name=None, server_version=None, expires=None)
2025-05-13 02:04:09,947|I| Authentication strategy: "B"
2025-05-13 02:04:09,947|I| No active sessions, logging in...
2025-05-13 02:04:09,947|I| Loading auth webpage at https://localhost:5000/sso/Login?forwardTo=22&RL=1&ip2loc=on
2025-05-13 02:04:09,970|E| Error encountered during authentication
config
ip2loc: "US"
proxyRemoteSsl: true
proxyRemoteHost: "https://api.ibkr.com"
listenPort: 5000
listenSsl: true
ccp: false
svcEnvironment: "v1"
sslCert: "vertx.jks"
sslPwd: "mywebapi"
authDelay: 3000
portalBaseURL: ""
serverOptions:
blockedThreadCheckInterval: 1000000
eventLoopPoolSize: 20
workerPoolSize: 20
maxWorkerExecuteTime: 100
internalBlockingPoolSize: 20
cors:
origin.allowed: "*"
allowCredentials: false
webApps:
- name: "demo"
index: "index.html"
ips:
allow:
- 10.*
- 192.*
- 131.216.*
- 172.17.0.* # docker internal
- 172.18.0.* # bridge docker network
- 127.0.0.1 # localhost
- x.x.x.x # IP address of your machine used to call the API
deny:
- 212.90.324.10
- 0.0.0.0/0 # all other addresses
docker-compose
ibeam:
image: voyz/ibeam
container_name: ibeam
env_file:
- env.list
ports:
- "6000:5000"
- "6001:5001"
restart: 'no'
Environment IBeam version: 'latest' as of May 12th (I should probably lock a version)
Happy to provide any other detail!
thanks for reporting @weklund 🙌 Very strange error it seems:
https://github.com/websocket-client/websocket-client/issues/607#issuecomment-811524973
if you run the command ulimit -n on your *nix system, you will see a limit of 1024. I don't think there is anything in this project's code that sets a limit of 1024, so this issue is something related to the operating system, websocket server, proxy, or other limiting factors.
https://stackoverflow.com/questions/7695701/filedescriptor-out-of-range-in-select-when-using-pythons-subprocess-with-rs
Prior to Python 2.7, programs that used ulimit -n to enable communication with large numbers of subprocesses could still monitor only 1024 file descriptors at a time, which caused an exception:
ValueError: filedescriptor out of range in select()This was due to the subprocess module using the select system call. The module now uses the poll system call, removing this limitation.
I cannot find any further info on how to fix or prevent it from happening. From these posts I'm guessing that a container restart could be the right solution here, but I don't think we can initialise it from within IBeam. We could stop IBeam with critical error to indicate that one is needed.
My proposal:
- catch this error and restart the Gateway (this we can do) hoping this will solve it. Bugfix code should include a
#todoindicating what to do next in case this issue reappears, which is: - shut down with a critical error and display a message that a container restart is needed
Thoughts?
I did notice anecdotally that there might be been some runaway process on my compute instance, but don't have the observability tools setup yet to pin point it.
I think I would want a way to automatically do a container restart. That way I can have monitoring tools alarm on it. Would there be a way we can have a specific healthcheck that would fail if this happened?
I do currently have this enabled on my docker-compose.yml
healthcheck:
test: curl -fk https://localhost:5000/v1/api/one/user > /dev/null || exit 1
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
I guess this didn't resolve it... 🤔
I'm going to setup some better tooling here and see if I can reproduce it, then look at more detailed information about the container and the entire instance.
@weklund Gotcha! Thanks for the update. IBeam deploys a health server deployed at 5001 by default. If /readyz doesn't return 200 it means something is wrong. Would that be of any use?