fredapi icon indicating copy to clipboard operation
fredapi copied to clipboard

Calling get_series with realtime_start or realtime_end fails

Open elmotec opened this issue 8 years ago • 3 comments

@mortada, one more thing I'd like to fix. When one calls get_series with realtime_start or realtime_end, the argument is passed verbatim and if the format is not a valid YYYY-MM-DD date, the url is invalid and Fred rejects the request.

I'd like to handle realtime_start and realtime_end the same way as observation_start and observation_end. Then if any of the 2 realtime argument is specified, return a pandas.DataFrame instead of a pandas.Series with a pandas.MultiIndex composed of the observation dates as well as the realtime start and realtime end.

I recognize that it's a little bit redundant with the other functions of the Fred class (e.g. get_series_all_releases) but it provides more flexibility like asking for revisions during a specific time span (say 3M at the begining of the year). I also think that having both realtime_start and realtime_end in the index makes it easier to manipulate the data to find what was the value of the series at a particular point in time.

I've uploaded my branch to https://github.com/elmotec/fredapi/tree/get_series_with_realtime (see TestFred.test_get_series_with_realtime for an example). Let me know what you think.

I also think it would make sense to even be able to request multiple series and leverage pandas to align the dates. That would be my next enhancement.

elmotec avatar Jul 26 '15 03:07 elmotec

@elmotec first of all not quite sure what you mean when you say get_series() fails with realtime_start or realtime_end, the method get_series() does not have those parameters.

Also, to get data as of a particular point in time, there is already a function for that: get_series_as_of_date().

mortada avatar Jul 26 '15 06:07 mortada

As for including realtime_end in the output, I actually had a placeholder before for that but decided not to include it, you can see it here: https://github.com/mortada/fredapi/blob/master/fredapi/fred.py#L247

I didn't include it because technically it is redundant info (you can figure out realtime_end by looking at the realtime_start for the next row). And it seems like the filtering you'd care about would be on realtime_start anyway. But if you have a use case for it we could easily include it either by default or as an option.

mortada avatar Jul 26 '15 06:07 mortada

Sorry @mortada, I should have given an example. Here it is:

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Module readline not available.
Traceback (most recent call last):
  File "C:\Users\jlecomte\scripts\startup.py", line 28, in <module>
    readline.parse_and_bind("tab: complete")
NameError: name 'readline' is not defined
>>> import fredapi
>>> fred = fredapi.Fred()
>>> fred.get_series('GDP', observation_start='1/1/2015', observation_end='7/1/2015', realtime_start='1/1/2015')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fredapi\fred.py", line 133, in get_series
    root = self.__fetch_data(url)
  File "fredapi\fred.py", line 70, in __fetch_data
    raise ValueError(root.get('message'))
ValueError: Bad Request.  Variable realtime_start is not a YYYY-MM-DD formatted date (e.g. 2000-02-24).

Even if I format realtime_start as 2015-01-01 I am not getting all the updates from 1/1/2015 onward.

>>> fred.get_series('GDP', observation_start='1/1/2015', observation_end='7/1/2015', realtime_start='2015-01-01')
2015-01-01    17693.3
dtype: float64

Now, where I think adding rt_start to the DataFrame is beneficial is to find out the value of an index as of a particular point in time, like this:

>>> import fredapi
>>> fred = fredapi.Fred()
>>> df = fred.get_series('GDP', observation_start='1/1/2015', observation_end='7/1/2015', realtime_start='1/1/2015')
>>> df
                                               GDP
obs_date   rt_start   rt_end
2015-01-01 2015-04-29 2015-05-28 00:00:00  17710.0
           2015-05-29 2015-06-23 00:00:00  17665.0
           2015-06-24 9999-12-31           17693.3
>>> import pandas as pd
>>> as_of_date = pd.to_datetime('2015-06-01')
>>> df[df.index.map(lambda x: x[1] < as_of_date < x[2])]
                                             GDP
obs_date   rt_start   rt_end
2015-01-01 2015-05-29 2015-06-23 00:00:00  17665
>>>

Sure you can do it with just rt_start but you kind of need to infer rt_end from the next row which I suspect will end up a little awkward (I have not even tried). This is especially true if one merges multiple series in one DataFrame:

>>> gdp = fred.get_series('GDP', observation_start='1/1/2015', observation_end='1/1/2015', realtime_start='1/1/2015')
>>> cp = fred.get_series('CP', observation_start='1/1/2015', observation_end='1/1/2015', realtime_start='1/1/2015')
>>> usa = pd.concat([cp, gdp], axis=1)
>>> usa
                                               CP      GDP
obs_date   rt_start   rt_end
2015-01-01 2015-04-29 2015-05-28 00:00:00     NaN  17710.0
           2015-05-29 2015-06-23 00:00:00  1893.8  17665.0
           2015-06-24 9999-12-31           1891.2  17693.3
>>> as_of_date = pd.to_datetime('2015-06-01')
>>> usa[usa.index.map(lambda x: x[1] < as_of_date < x[2])]
                                               CP    GDP
obs_date   rt_start   rt_end
2015-01-01 2015-05-29 2015-06-23 00:00:00  1893.8  17665

BTW, I think it would be really awesome to query the usa DataFrame (both GDP and CP) in one fredapi call like get_series(['CP', 'GDP'], ...). I've started to work on this.

I agree that making get_series more powerful will bring an element of redundancy to fredapi. For intance, get_series_as_of_date() serves a good purpose if you already know what as-of date you want but in the context of tinkering with data or graphing a function of multiple series with respect to a range of as-of date, having everything in one big DataFrame makes it much more efficient as one can harness the power of pandas more easily, and does not need to make repeated calls to FRED.

Note that the current behaviour of get_series will stay the same if no realtime argument is passed.

elmotec avatar Jul 26 '15 15:07 elmotec