ibind icon indicating copy to clipboard operation
ibind copied to clipboard

Tickler error sometimes

Open oleg-d opened this issue 6 months ago • 14 comments

Describe Enhancement

Sometimes I will see a Tickler error in the logs, looks like some timeout is happening. Is there something I need to be calling on a regular basis?

For my health check I already call self.client.initialize_brokerage_session(compete=True) every 60 mins, but on average about once a night i will see this tickler error occurred when I check in the morning as the app is left running in a VPS.

Should i be calling self.client.tickle() ?

2025-06-12 00:06:21,238|E| Tickler error:
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1032, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\client\ibkr_utils.py", line 568, in _worker
    _LOGGER.error(f'Tickler error: {exception_to_string(e)}')
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\client\ibkr_utils.py", line 563, in _worker
    self._client.tickle()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\client\ibkr_client_mixins\session_mixin.py", line 49, in tickle
    return self.post('tickle', log=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\base\rest_client.py", line 156, in post
    return self.request(method='POST', endpoint=path, base_url=base_url, extra_headers=extra_headers, json=params, log=log)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\base\rest_client.py", line 200, in request
    return self._request(method, endpoint, base_url, extra_headers, log, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\client\ibkr_client.py", line 123, in _request
    return super()._request(method, endpoint, base_url, extra_headers, log, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\base\rest_client.py", line 241, in _request
    raise TimeoutError(f'{self}: Reached max retries ({self._max_retries}) for {method} {url} {kwargs}') from e

  <class 'TimeoutError'> IbkrClient: Reached max retries (3) for POST https://api.ibkr.com/v1/api/tickle {}

The below exception was the direct cause of the above exception:

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\ibind\base\rest_client.py", line 235, in _request
    response = request_function(method, url, verify=self.cacert, headers=headers, timeout=self._timeout, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)

  <class 'requests.exceptions.ReadTimeout'> HTTPSConnectionPool(host='api.ibkr.com', port=443): Read timed out. (read timeout=10)

Context

It doesnt seem to affect the IBKR connection really as even after the tickler error connection is either restored or still remains valid as I can check PnL through my app for example.

Possible Implementation

Your Environment

  • IBind version: 0.1.14
  • Python version (e.g. 3.10): 3.12
  • Authentication method (Gateway, IBeam or OAuth): OAuth
  • Operating System and version: Windows Server 2022
  • Link to your project/code (if public)

oleg-d avatar Jun 12 '25 05:06 oleg-d

hey @oleg-d thanks for the detailed description of this issue 👍

It seems that this is expected behaviour. IBKR says they are/may be restarting their servers once a day at night. When that happens, requests time out. Tickler attempts to connect few times, fails and produces this exception. I've added a special except clause to treat this as a warning rather than an exception, as it is probably expected and non-critical. Does it happen to you frequently, or only once a night?

Is there something I need to be calling on a regular basis? Should i be calling self.client.tickle() ?

No, in theory tickler should handle it.

Voyz avatar Jun 13 '25 18:06 Voyz

I've seen the same thing. I think it's because the thread is always running after a client is initialized. The client was never sent to GC after existing the scope in the main thread, because the tickler thread is running.

I haven't figured out a better way to handle this though, because I call the client throughout my code and there is no clear place where I'm done with the client and call client.close() or client.stop_tickler().

I've added a special except clause to treat this as a warning rather than an exception

This sounds reasonable to me.

https://github.com/Voyz/ibind/blob/master/ibind/client/ibkr_utils.py#L560

        while not self._stop_event.wait(self._interval):
            try:
                self._client.tickle()
            except KeyboardInterrupt:
                _LOGGER.info('Tickler interrupted')
                break
            except Exception as e:
                _LOGGER.error(f'Tickler error: {exception_to_string(e)}')

we might want to consider handle the exception better here and break the loop if it's some unrecoverable exception.

hazelement avatar Jun 17 '25 01:06 hazelement

To me this error used to happen overnight randomly, probably once a night.

Well what I decided to try was to add an explicit call client.tickle() to my health check which runs every 50 mins, and so far this week I have observed 0 errors out of 2 days.

def _check_ibkr_health(self):
        try:
            self.client.tickle()
            sleep(0.1) 
            self.client.initialize_brokerage_session(compete=True)
            sleep(0.1)

oleg-d avatar Jun 17 '25 06:06 oleg-d

@hazelement thanks for your comments and suggestions.

The client was never sent to GC after existing the scope in the main thread, because the tickler thread is running.

Sorry, GC of what? Of tickler? The tickler _worker method has no variables that are stored or accumulated in its while loop.

Or did you mean the main thread? In this case yes, well, there indeed is a child thread running that needs to be stopped.

I haven't figured out a better way to handle this though, because I call the client throughout my code and there is no clear place where I'm done with the client and call client.close() or client.stop_tickler().

This should be done automatically if use_oauth=True and oauth_config.shutdown_oauth=True (the latter is True by default). Does it not happen automatically for you? If so, are either of these flags False in your setup?

we might want to consider handle the exception better here and break the loop if it's some unrecoverable exception.

Calling tickle will unlikely cause an unrecoverable exception. Can you think of any that we should handle in such a critical fashion?

Voyz avatar Jun 17 '25 09:06 Voyz

@oleg-d

To me this error used to happen overnight randomly, probably once a night.

That sounds exactly like IBKR restarting their servers. This should be normal behaviour, so the new exception handling should make this more graceful.

Well what I decided to try was to add an explicit call client.tickle() to my health check which runs every 50 mins, and so far this week I have observed 0 errors out of 2 days.

So now that you do this you don't get these overnight exceptions? Did you also disable the Tickler?

Well this would make sense, the Tickler calls the tickle endpoint once a minute - like IBKR suggests - which makes it likely one such request would happen during their server restart. Your 50 minute interval most likely misses the server restart. If 50 minut interval is sufficient for you, then I'd suggest you simply use Tickler with interval=60*50. I just realised there is no way to set it up like this currently, as interval parameters is not exposed to the API - I'll add this in.

Voyz avatar Jun 17 '25 09:06 Voyz

Released all fixes with 0.1.15 branch. TimeoutError is handled and interval can be specified through the IBIND_TICKLER_INTERVAL env var

Voyz avatar Jun 17 '25 09:06 Voyz

@Voyz I was referring to the main thread GC, and your comment is accurate. In terms of unrecoverable exception, I think TimeoutError example above is a good one, assuming the existing session is no longer valid after IBKR restarted their server.

This should be done automatically if use_oauth=True and oauth_config.shutdown_oauth=True (the latter is True by default). Does it not happen automatically for you? If so, are either of these flags False in your setup?

Here is my client initialization code and I think it matches your recommendation. Does use_session=False affect anything?

client = IbkrClient(
    use_oauth=True,
    use_session=False,
    oauth_config=OAuth1aConfig(
        access_token=IBKR_ACCESS_TOKEN,
        access_token_secret=IBKR_ACCESS_TOKEN_SECRET,
        consumer_key=IBKR_CONSUMER_KEY,
        dh_prime=IBKR_DHPRIME,
        encryption_key_fp=f'{ROOT_FOLDER}/secrets/private_encryption.pem',
        signature_key_fp=f'{ROOT_FOLDER}/secrets/private_signature.pem',
    )
)

Does it not happen automatically for you?

How does the system handle stop ticker automatically? Currently, I'm calling client.close() on invalid sessions and instantiate a new client.

hazelement avatar Jun 18 '25 02:06 hazelement

@hazelement

In terms of unrecoverable exception, I think TimeoutError example above is a good one, assuming the existing session is no longer valid after IBKR restarted their server.

I understand your point. However, I think Tickler can continue to operate normal on TimeoutError, rather than break its loop. Let me explain:

(Keep in mind we're talking about OAuth here; using Gateway this would look differently)

  • Tickler is a class that calls tickle endpoint in a loop on a separate thread.
  • It uses an instance of IbkrClient class to make that tickle call, relying on it having a valid session.
  • Once session becomes invalid, a new session needs to be created in IbkrClient. Calling client.close() and client.oauth_init() is enough to do this - there shouldn't be a need to instantiate a new client class. I understand that this is not documented anywhere, I've added documenting the session restart to the backlog.
  • Once a new session is created, Tickler should be able to continue its operation without any issues. It does not need to be restarted for this to work - only needs the IbkrClient to hold a valid session.

Hence:

  • Stopping Tickler on TimeourError would be redundant, as it isn't related to the cause of the issue nor can Tickler fix it. It needs to wait for the valid session, and whether we stop it and restart it or just let it keep on running is of no difference.
  • Recreating a valid session in the existing IbkrClient is the correct way this should be handled.

Nevertheless:

How does the system handle stop ticker automatically? Currently, I'm calling client.close() on invalid sessions and instantiate a new client.

  • From what you're writing I assume that client.close() does not stop Tickler - is that correct? Indeed, Tickler should be stopped once we call that method. Do you see any of these logs printed:

    Shutting down OAuth Tickler gracefully stopped

I did just find one bug related to this restart process, however it shouldn't affect your case much. Anyway, I've fixed it on ibind==0.1.16rc1, please give it a shot, calling client.close() and client.oauth_init() and let me know if it resumes correctly without needing to reinstantiate the class.

Voyz avatar Jun 23 '25 10:06 Voyz

@Voyz thanks for explaining how the client works internally. I think it makes sense that we should keep the tickler running.

From user perspective, once I initialized a client, I would expect the framework will maintain the session to continue to work so that I don't have to worry about it. Assuming that's the goal of the framework, I feel that it's less intuitive to require user to call client.close() when some client internal code (the Tickler) runs into issues. Especially there is no explicit way for user code to detect such issue. Ideally, the client should handle itself internally.

I merely try to think of a way to make it more user friendly. Do you think it's possible to have tickler to auto call client.close() and client.oauth_init() when tickle run into session related issues?

hazelement avatar Jun 26 '25 03:06 hazelement

I did just find one bug related to this restart process, however it shouldn't affect your case much. Anyway, I've fixed it on ibind==0.1.16rc1, please give it a shot, calling client.close() and client.oauth_init() and let me know if it resumes correctly without needing to reinstantiate the class.

I will keep an eye on this. So far, I haven't seen any issues for quite a few days. Thank you!

hazelement avatar Jun 26 '25 03:06 hazelement

@hazelement

From user perspective, once I initialized a client, I would expect the framework will maintain the session to continue to work so that I don't have to worry about it. Assuming that's the goal of the framework, I feel that it's less intuitive to require user to call client.close() when some client internal code (the Tickler) runs into issues.

That's perfectly understandable. I can see the benefit of handling this automatically for the user, but I also feel there may be some scenarios where we'd do something against their will. Eg. if the authentication is dropped due to them logging in elsewhere, eg. into TWS, then such functionality - especially if it's On by default - could be troublesome.

That being said...

Especially there is no explicit way for user code to detect such issue.

There is one! There's the check_health method, you can read more about it here: https://github.com/Voyz/ibind/wiki/Advanced-REST#-check_health

That's the expected interaction I'd have from the users:

  • Use IBind in whichever way you need to - with IBeam, OAuth, raw Gateway, etc.
  • Call check_health in a loop
  • When it returns False, implement logic suited to your deployment that would attempt to re-establish the connection.

Something along the lines of:

# run this in a loop
def check_health(client:IbkrClient, use_oauth:bool=False) -> bool:
    healthy = client.check_health()
    if healthy:
        _LOGGER.info(f'IBKR connection in good health: {client.check_health()}')
        return True

    _LOGGER.info(f'IBKR connection in bad health: {client.check_health()}')
    if use_oauth:
        try:
            client.stop_tickler()
        except Exception as e:
            _LOGGER.error(f'Error stopping tickler.')

        try:
            client.oauth_init(maintain_oauth=True, init_brokerage_session=True)
        except Exception as e:
            _LOGGER.error(f'Error reauthenticating with OAuth.')
    return False

Point in case: If using IBeam instead of the other options, the right course of actions after getting a False from check_health would be to... do nothing. IBeam will handle the reconnection; our script should just wait for it to succeed.

Hence I didn't opt in for handling this automatically.

All that being said, it's not a bad idea to have a 'Maintainer' used to re-authenticate on error alongside with the already implemented Tickler is there to keep the session alive. Would you find it useful?

Voyz avatar Jul 03 '25 15:07 Voyz

if the authentication is dropped due to them logging in elsewhere, eg. into TWS, then such functionality - especially if it's On by default - could be troublesome.

I think you had a great point here, and I agree. It might not be the best idea for the framework to maintain it for the user. In fact, I will find it annoying as well because I log in to my TWS to check order status sometimes as well. This will kick me out of my TWS session.

There is one! There's the check_health method

I did not know there is this thing! I should use it to in my loop instead doing an account retrieval to detect error.

The current state seems to be the best setup for the moment. Thanks for the good conversation. ;)

hazelement avatar Jul 04 '25 03:07 hazelement

[...] because I log in to my TWS to check order status sometimes as well. This will kick me out of my TWS session.

Just a comment, not disputing the validity of the point made: I have set up a second user for my account for this reason. Both users have OAuth access, but only one is used by ibind. The other user can happily log into TWS and do stuff.

salsasepp avatar Jul 04 '25 15:07 salsasepp

hey @hazelement I've updated the WiKi to clarify this pattern: https://github.com/Voyz/ibind/wiki/Advanced-REST#-check_health

Also, I ended up adding a method that can be called repeatedly to handle the health status: https://github.com/Voyz/ibind/blob/ed7a95779d4235da996216c47bfcc1bbf45ac459/ibind/client/ibkr_client.py#L266-L298 It will be out with the next release

Voyz avatar Jul 07 '25 04:07 Voyz