finnhub-python icon indicating copy to clipboard operation
finnhub-python copied to clipboard

Incremental aggregation of intraday data.

Open preritdas opened this issue 2 years ago • 1 comments

Unfortunately, the official Finnhub API can only return one month of intraday data at a time, regardless of the user's plan. There is no way around this. Finnhub support suggests gathering intraday data from each month in separate requests, then combining them.

The updates I propose are non-breaking and will have no impact on existing user code depending on this package. They simply provide two new methods in the Client class: Client.stock_candles_intraday and its internal dependency, Client.stock_candles_df.

Intraday Stock Candles

When the _from and to function parameters are wider than one month, the intraday stock candles method will increment through the window and gather data in separate requests (separated with a 0.4-second delay to stay within the rate limit, which can easily be changed or implemented differently to account for different plans, ex. if a user has a higher rate limit, the delay would be smaller, speeding up the process; I expand on this at the end).

The data is complete; there is absolutely no missing data between incremented windows. The effective behavior of this function is as if the 1-month intraday API limitation didn't exist, at the expense of a slower response time (sleeping between window aggregations to stay within the API rate limit; I expand on this at the end).

The data is returned in pd.DataFrame format, processed in the following ways.

  1. A Date column is created in datetime format and set as the index. This allows for windowed lookups, etc.
  2. Single character keys from the original JSON response (c, l, o, h, etc.) are turned into proper column names ("Open", "High", "Low", "Close", "Volume"), recognizable by most financial data libraries including TA-Lib and Pandas TA.
  3. A new optional filter_eod parameter will filter the DataFrame for data that came from within market hours. This is made possible by the fact that we parsed and indexed the new Date column in datetime format.

https://github.com/preritdas/finnhub-python/blob/b3b72157b07fa4de4593e006623b599b85b362c6/finnhub/client.py#L229-L257

Thoughts and Ideas

  • I see this new functionality as more of a convenience than anything else. It doesn't mess with any existing code or functionality at all, but "solves" one of the biggest limitations of the Finnhub API, the 1-month maximum response for intraday queries.
  • The only unfortunate truth about this implementation is the delay. Because separate API calls are being made for each incremental data window, the time to complete the Client.stock_candles_intraday function is ~0.43 seconds per month, so if a user is querying 10 years of intraday data, it may take ~50 seconds. As I mention below, solving this issue depending on a user's plan can reduce that minute to about 8 seconds, assuming the user has the professional plan with a 900/min rate limit. The bottleneck here is the rate limit, not the implementation.
  • That said, this implementation still seems worth it to me because there's zero change in functionality or timing when querying as one might today. The delay in incrementation is a small price to pay for the convenience of automatic incremental aggregation, despite the Finnhub API 1-month response limitation.
  • If this feature is something we want to at least look into, I'd like to find a way to smoothly solve the issue of having a static 0.4-second delay between internal API calls (stay within rate limit). This came from the fact that the market data "basic" plan has a 150/min rate limit, which is effectively 0.4 seconds per call. As I mentioned above, if a user has a better plan, they can wait less between internal API calls without busting their rate limit. For example, a "professional" plan holder can afford to wait only 0.067 seconds in between internal API calls, rendering their multi-window aggregation time 6x faster than the "basic" plan holder.
  • If we could somehow determine a user's plan (and subsequently, their rate limit) based on their API key, from within the Client class, we could make the internal 0.4-second delay dependent on their plan, speeding up the aggregation by multiples if a user has a plan with a higher rate limit.

https://github.com/preritdas/finnhub-python/blob/b3b72157b07fa4de4593e006623b599b85b362c6/finnhub/client.py#L287

preritdas avatar Aug 30 '22 03:08 preritdas

Rate Limit Handled

I created a decorator, handle_rate_limit which wraps a function in an exception handler. No changes need to be made to any implementations - this is completely internal and only impacts the two new functions I created (mentioned in my first comment).

https://github.com/preritdas/finnhub-python/blob/3cc74716db9b37d55497c2c7fee81fe659d2b8ec/finnhub/client.py#L11-L25

I then wrap my stock_candles_df function, which is the backend for incremental aggregation, with this decorator.

https://github.com/preritdas/finnhub-python/blob/3cc74716db9b37d55497c2c7fee81fe659d2b8ec/finnhub/client.py#L246-L250

The effect is that the function is called as normal, but if the user's rate limit is exceeded, we sleep for a second and try the endpoint again, recursively, continually, until the limit is resolved.

Result

This rate limit handling behavior will only occur when aggregating incremental data with the new Client.stock_candles_intraday method. As a result of this implementation, I was able to completely remove the time.sleep(0.4) line I expressed concern over in my original comment. The incremental aggregation function runs several times faster than before, and only slows in the rare case that a user exceeds their rate limit by having too many data windows (very unlikely, see below).

The reason this is unlikely is as follows.

The basic plan rate limit is 150/min.

$$ 365.25 \ \text{days} * 10 \ \text{years} = 3652.5 \ \text{days} $$

$$ 3652.5 \ \text{days} / 29 \ \text{days/window} = 126 \ \text{windows} $$

So even the basic plan can increment through 10 years of intraday data without having rate limit issues (their maximum lookback period anyways). But, this way, we have a consistently performant failsafe.

preritdas avatar Aug 30 '22 19:08 preritdas