stats_can icon indicating copy to clipboard operation
stats_can copied to clipboard

Start Date not Respected

Open djleblan1 opened this issue 2 years ago • 2 comments

Hello.

I discovered an unexpected behaviour with this package. If I specify start and end dates consisting of the last 10 years for vector id "v1230996350" I end up with a dataframe populated with data points from 1956 until now. I am running Python 3.9.13 and stats_can 2.5.1. The following code block reproduces this issue for me in Jupyter Notebook.

from datetime import date, timedelta
from stats_can import StatsCan
end_date = date.today()
start_date = end_date - timedelta(days=365 * 10)
sc = StatsCan()
df = sc.vectors_to_df_remote(["v1230996350"], start_release_date=start_date, end_release_date=end_date)
df = df.reset_index(drop=False)
df

Thanks!

djleblan1 avatar Dec 17 '22 15:12 djleblan1

Hi @djleblan1 the start_release_date and end_release_date parameters refer to the date the data was released, not the reference period, which I think is what you're expecting. At the time I developed this the API for retrieving individual vectors only allowed reference by release date. Based on this it looks like there's a method that would allow retrieval by reference date. I can't promise I'll get around to adding that soon, but I'd look at a PR if you're interested in adding it.

ianepreston avatar Dec 20 '22 14:12 ianepreston

I was interested in implementing 'getDataFromVectorByReferencePeriodRange' and came up with the following, which seems to work. Please edit and incorporate this into stats_can, if it's useful. Thanks...........Grenny

def get_bulk_vector_data_by_period_range(vectors, start_date, end_date):
    # https://www.statcan.gc.ca/en/developers/wds/user-guide#a12-5a
    #
    # Parameters
    # ----------
    # vectors: str or list of str
    #     vector numbers to get info for
    # start_release_date: datetime.date
    #     start release date for the data
    # end_release_date: datetime.date
    #     end release date for the data
    #
    # Returns
    # -------
    # List of dicts containing data for each vector
    url = SC_URL + 'getDataFromVectorByReferencePeriodRange?vectorIds='
    # create a string containing all vectors for retrieval
    vector_string = ''
    final_list = []
    for v in vectors:
        # remove initial 'v' string if present
        if v[0]=='v':
            v = v[1:len(v)]
        vector_string+=v+','
    # Remove the final comma and add an &
    vector_string = vector_string[0:len(vector_string)-1]+'&'
    # Create the period dates
    start_date = 'startRefPeriod='+str(start_date)+'&'
    end_date = 'endReferencePeriod='+str(end_date)
    # Create the full url string
    url_string = url+vector_string + start_date + end_date
    # https://requests.readthedocs.io/en/latest/
    result = requests.get(url_string)
    string = result.json()
    # add the list of json 'object's to return
      # NOTE: Since I did not know how to implement the check_status of requests.get()
      # I checked responseStatusCode and assumed a value of 0 shows no error
    final_list += [r["object"] for r in string if r['object']['responseStatusCode']==0]
    return final_list

Grenny1 avatar Aug 04 '24 01:08 Grenny1