alpaca-py icon indicating copy to clipboard operation
alpaca-py copied to clipboard

Ignored "raw" flag for StockDataStream

Open nv-pipo opened this issue 2 years ago • 11 comments
trafficstars

Issue

Initializing StockDataStream with parameter raw_data=True creates a websocket that returns timestamp as 'Timestamp', instead of the RAW string with the datetime.

Replicate

from alpaca.data.live.stock import StockDataStream

wss_client = StockDataStream('api-key', 'secret-key', raw_data=True)

# async handler
async def quote_data_handler(data: Any):
    # quote data will arrive here
    print(data)
    print("===")

wss_client.subscribe_quotes(quote_data_handler, "AAPL")

wss_client.run()

output: Screenshot 2023-02-08 at 1 30 14 PM

Expected output (from https://alpaca.markets/docs/api-references/market-data-api/stock-pricing-data/realtime/): Screenshot 2023-02-08 at 1 29 17 PM

nv-pipo avatar Feb 08 '23 12:02 nv-pipo

By the way, I am seeing two timestamps in the raw quote, both as msgpack.ext.Timestamp objects, with keys t and r. Only t is documented whereas r isn't.

Examples:

<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'Q', 'ap': 313.74, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=506885515), 'r': Timestamp(seconds=1682466638, nanoseconds=507788703)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.74, 'as': 18, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507231768), 'r': Timestamp(seconds=1682466638, nanoseconds=508096712)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.73, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507273406), 'r': Timestamp(seconds=1682466638, nanoseconds=508120891)}

I see that t < r.

@nv-pipo Just so I understand, what's currently the harm in leaving it as a msgpack.ext.Timestamp object? Timestamps are of course essential for trading. Do you have a much faster way of parsing the timestamp?

Here are the available methods on the timestamp object:

dir(msgpack.ext.Timestamp)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'from_bytes', 'from_datetime', 'from_unix', 'from_unix_nano', 'nanoseconds', 'seconds', 'to_bytes', 'to_datetime', 'to_unix', 'to_unix_nano']

impredicative avatar Apr 26 '23 00:04 impredicative

@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".

nv-pipo avatar May 01 '23 08:05 nv-pipo

@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".

I believe it's natively coming in through the socket with msgpack encoding. As such, msgpack seems to be its raw format. As such, the expectation of a string is probably invalid.

impredicative avatar Jun 19 '23 16:06 impredicative

@impredicative Well, it seems that msgpack supports strings: https://en.wikipedia.org/wiki/MessagePack. I'm no expert thought.

Note that the RAW flag for the timestamp field is also ignored in the historical API calls (let me know if I should open a separate issue):

>>> from alpaca.data.timeframe import TimeFrame
>>> from alpaca.data.requests import StockBarsRequest
>>> from alpaca.data.historical import StockHistoricalDataClient
>>>
>>> # keys required
>>> stock_client = StockHistoricalDataClient(key, secret)
>>> request_params = StockBarsRequest(
...                         symbol_or_symbols=["AAPL"],
...                         timeframe=TimeFrame.Day,
...                         start="2022-10-04 00:00:00",
...                         end="2022-10-05 00:00:00",
...                         raw_format=True,
...                  )
>>>
>>> bars = stock_client.get_stock_bars(request_params)
>>> print(bars)
data={'AAPL': [{   'close': 146.1,
    'high': 146.22,
    'low': 144.26,
    'open': 145.03,
    'symbol': 'AAPL',
    'timestamp': datetime.datetime(2022, 10, 4, 4, 0, tzinfo=datetime.timezone.utc),
    'trade_count': 687505.0,
    'volume': 87880164.0,
    'vwap': 145.490979}]}
>>>

nv-pipo avatar Jun 28 '23 08:06 nv-pipo

msgpack is ostensibly specified as the desired encoding by the client, and so in my understanding it makes sense for the response to use a msgpack object where it helps. If you don't want it, try using your own custom client without requesting the msgpack encoding, although the results may then be inefficiently delivered. I am not convinced that this issue should stay open.

impredicative avatar Jun 28 '23 23:06 impredicative

Yeah, I had come to the same conclusion some time ago: I've been using my own client for months.

However, I believe it is important to fulfill the flag expectations! and this alpaca-py's stream and historical endpoints do not truly respect the RAW flag. This should be fixed if you want people to use the SDK, instead of driving them away.

nv-pipo avatar Jun 29 '23 07:06 nv-pipo

Well, it is important for the user to have both the integer seconds value and the integer nanoseconds value. If a msgpack object was not used, the two values would then have to be represented in a list, but they would then be unclear to a reader as to what the list means.

An RFC-3339 timestamp, regardless of its precision, is less useful here because it is a slow and complicated object to work with when speed matters. That's even more true when it's a string that requires parsing. Integers are simplest and fastest to work with.

impredicative avatar Jul 02 '23 22:07 impredicative

@impredicative I am not arguing that a string representation of a date is better than a "date data type"!

I am simply saying that functions should behave as expected, given the parameters passed: if the parameter RAW is passed, then the RAW response has to be returned. This is not the case today, for neither Websockets nor historical API calls, and thus needs to be fixed. (RAW is described/defined by the RESTful/Websocket endpoints.)

Alternatively, if these functions cannot return the expected value, the RAW parameter should be dropped from functions calls...or rename it to match the return value, eg: "RAW_EXCEPT_FOR_TIMESTAMP".

nv-pipo avatar Jul 03 '23 09:07 nv-pipo

I see your point. Perhaps the arg name raw_data can be changed to native.

impredicative avatar Jul 03 '23 15:07 impredicative

That's fair.

Should I open a separate ticket for the "Historical Stock Pricing Data" endpoint?

nv-pipo avatar Jul 04 '23 06:07 nv-pipo

@nv-pipo I would just add it to the original comment of this issue itself, also including code to replicate it.

impredicative avatar Jul 04 '23 19:07 impredicative