ibeam icon indicating copy to clipboard operation
ibeam copied to clipboard

Breaking failure: <class 'ValueError'> filedescriptor out of range in select()

Open weklund opened this issue 7 months ago • 3 comments

Describe the bug I found an issue that put IBeam into failure scenario, didn't realize it kept retrying. Don't have a ton of information about it, nor can I reproduce (Maybe just time?)

The fix was just docker compose down and up but would like to understand what happened here.

2025-05-13 02:02:09,968|I| Cleaning up the resources. Display: None | Driver: None
2025-05-13 02:02:09,968|I| Logging in failed
2025-05-13 02:02:39,662|I| Maintenance
2025-05-13 02:02:54,947|I| Attempt number 2
2025-05-13 02:02:54,947|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:03:09,968|I| Max request retries reached after 2 attempts. Consider increasing the retries by setting IBEAM_REQUEST_RETRIES environment variable
2025-05-13 02:03:09,968|I| NO SESSION Status(running=True, session=False, connected=False, authenticated=False, competing=False, collision=False, session_id=None, server_name=None, server_version=None, expires=None)
2025-05-13 02:03:09,968|I| Authentication strategy: "B"
2025-05-13 02:03:09,968|I| No active sessions, logging in...
2025-05-13 02:03:09,968|I| Loading auth webpage at https://localhost:5000/sso/Login?forwardTo=22&RL=1&ip2loc=on
2025-05-13 02:03:09,968|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:03:09,991|I| Cleaning up the resources. Display: None | Driver: None
2025-05-13 02:03:09,991|I| Logging in failed
2025-05-13 02:03:09,991|E| Error encountered during authentication 
Exception:
  File "/usr/local/lib/python3.11/threading.py", line 995, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.11/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/srv/ibeam/src/gateway_client.py", line 115, in _maintenance
    success, shutdown, status = self.start_and_authenticate(request_retries=self.request_retries)
  File "/srv/ibeam/src/gateway_client.py", line 62, in start_and_authenticate
    success, shutdown, status = self.strategy_handler.try_authenticating(request_retries=request_retries)
  File "/srv/ibeam/src/handlers/strategy_handler.py", line 85, in try_authenticating
    return self._authentication_strategy_B(status, request_retries)
  File "/srv/ibeam/src/handlers/strategy_handler.py", line 140, in _authentication_strategy_B
    return self._log_in(status)
  File "/srv/ibeam/src/handlers/strategy_handler.py", line 151, in _log_in
    success, shutdown = self.login_handler.login()
  File "/srv/ibeam/src/handlers/login_handler.py", line 468, in login
    driver, display = start_up_browser(self.driver_factory)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/ibeam/src/login/driver.py", line 150, in start_up_browser
    display.start()
  File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/display.py", line 72, in start
    self._obj.start()
  File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/abstractdisplay.py", line 149, in start
    self._start1_has_displayfd()
  File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/abstractdisplay.py", line 197, in _start1_has_displayfd
    self.display = int(self._wait_for_pipe_text(rfd))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/pyvirtualdisplay/abstractdisplay.py", line 295, in _wait_for_pipe_text
    (rfd_changed_ls, _, _) = select.select([rfd], [], [], 0.1)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  <class 'ValueError'> filedescriptor out of range in select()
2025-05-13 02:03:39,662|I| Maintenance
2025-05-13 02:03:54,919|I| Attempt number 2
2025-05-13 02:03:54,919|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:04:09,946|E| Connection timeout after 15 seconds. Consider increasing timeout by setting IBEAM_REQUEST_TIMEOUT environment variable. Error: The read operation timed out
2025-05-13 02:04:09,947|I| Max request retries reached after 2 attempts. Consider increasing the retries by setting IBEAM_REQUEST_RETRIES environment variable
2025-05-13 02:04:09,947|I| NO SESSION Status(running=True, session=False, connected=False, authenticated=False, competing=False, collision=False, session_id=None, server_name=None, server_version=None, expires=None)
2025-05-13 02:04:09,947|I| Authentication strategy: "B"
2025-05-13 02:04:09,947|I| No active sessions, logging in...
2025-05-13 02:04:09,947|I| Loading auth webpage at https://localhost:5000/sso/Login?forwardTo=22&RL=1&ip2loc=on
2025-05-13 02:04:09,970|E| Error encountered during authentication 

config

ip2loc: "US"
proxyRemoteSsl: true
proxyRemoteHost: "https://api.ibkr.com"
listenPort: 5000
listenSsl: true
ccp: false
svcEnvironment: "v1"
sslCert: "vertx.jks"
sslPwd: "mywebapi"
authDelay: 3000
portalBaseURL: ""
serverOptions:
  blockedThreadCheckInterval: 1000000
  eventLoopPoolSize: 20
  workerPoolSize: 20
  maxWorkerExecuteTime: 100
  internalBlockingPoolSize: 20
cors:
  origin.allowed: "*"
  allowCredentials: false
webApps:
  - name: "demo"
    index: "index.html"
ips:
  allow:
    - 10.*
    - 192.*
    - 131.216.*
    - 172.17.0.* # docker internal
    - 172.18.0.* # bridge docker network
    - 127.0.0.1 # localhost
    - x.x.x.x # IP address of your machine used to call the API
  deny:
    - 212.90.324.10
    - 0.0.0.0/0 # all other addresses

docker-compose

  ibeam:
    image: voyz/ibeam
    container_name: ibeam
    env_file:
      - env.list
    ports:
      - "6000:5000"
      - "6001:5001"
    restart: 'no'

Environment IBeam version: 'latest' as of May 12th (I should probably lock a version)

Happy to provide any other detail!

weklund avatar May 13 '25 02:05 weklund

thanks for reporting @weklund 🙌 Very strange error it seems:

https://github.com/websocket-client/websocket-client/issues/607#issuecomment-811524973

if you run the command ulimit -n on your *nix system, you will see a limit of 1024. I don't think there is anything in this project's code that sets a limit of 1024, so this issue is something related to the operating system, websocket server, proxy, or other limiting factors.

https://stackoverflow.com/questions/7695701/filedescriptor-out-of-range-in-select-when-using-pythons-subprocess-with-rs

Prior to Python 2.7, programs that used ulimit -n to enable communication with large numbers of subprocesses could still monitor only 1024 file descriptors at a time, which caused an exception: ValueError: filedescriptor out of range in select() This was due to the subprocess module using the select system call. The module now uses the poll system call, removing this limitation.

I cannot find any further info on how to fix or prevent it from happening. From these posts I'm guessing that a container restart could be the right solution here, but I don't think we can initialise it from within IBeam. We could stop IBeam with critical error to indicate that one is needed.

My proposal:

  • catch this error and restart the Gateway (this we can do) hoping this will solve it. Bugfix code should include a #todo indicating what to do next in case this issue reappears, which is:
  • shut down with a critical error and display a message that a container restart is needed

Thoughts?

Voyz avatar May 13 '25 10:05 Voyz

I did notice anecdotally that there might be been some runaway process on my compute instance, but don't have the observability tools setup yet to pin point it.

I think I would want a way to automatically do a container restart. That way I can have monitoring tools alarm on it. Would there be a way we can have a specific healthcheck that would fail if this happened?

I do currently have this enabled on my docker-compose.yml

    healthcheck:
      test: curl -fk https://localhost:5000/v1/api/one/user > /dev/null || exit 1
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

I guess this didn't resolve it... 🤔

I'm going to setup some better tooling here and see if I can reproduce it, then look at more detailed information about the container and the entire instance.

weklund avatar May 13 '25 15:05 weklund

@weklund Gotcha! Thanks for the update. IBeam deploys a health server deployed at 5001 by default. If /readyz doesn't return 200 it means something is wrong. Would that be of any use?

Voyz avatar May 16 '25 10:05 Voyz