stats_can
stats_can copied to clipboard
Start Date not Respected
Hello.
I discovered an unexpected behaviour with this package. If I specify start and end dates consisting of the last 10 years for vector id "v1230996350" I end up with a dataframe populated with data points from 1956 until now. I am running Python 3.9.13 and stats_can 2.5.1. The following code block reproduces this issue for me in Jupyter Notebook.
from datetime import date, timedelta
from stats_can import StatsCan
end_date = date.today()
start_date = end_date - timedelta(days=365 * 10)
sc = StatsCan()
df = sc.vectors_to_df_remote(["v1230996350"], start_release_date=start_date, end_release_date=end_date)
df = df.reset_index(drop=False)
df
Thanks!
Hi @djleblan1 the start_release_date
and end_release_date
parameters refer to the date the data was released, not the reference period, which I think is what you're expecting. At the time I developed this the API for retrieving individual vectors only allowed reference by release date. Based on this it looks like there's a method that would allow retrieval by reference date. I can't promise I'll get around to adding that soon, but I'd look at a PR if you're interested in adding it.
I was interested in implementing 'getDataFromVectorByReferencePeriodRange' and came up with the following, which seems to work. Please edit and incorporate this into stats_can, if it's useful. Thanks...........Grenny
def get_bulk_vector_data_by_period_range(vectors, start_date, end_date):
# https://www.statcan.gc.ca/en/developers/wds/user-guide#a12-5a
#
# Parameters
# ----------
# vectors: str or list of str
# vector numbers to get info for
# start_release_date: datetime.date
# start release date for the data
# end_release_date: datetime.date
# end release date for the data
#
# Returns
# -------
# List of dicts containing data for each vector
url = SC_URL + 'getDataFromVectorByReferencePeriodRange?vectorIds='
# create a string containing all vectors for retrieval
vector_string = ''
final_list = []
for v in vectors:
# remove initial 'v' string if present
if v[0]=='v':
v = v[1:len(v)]
vector_string+=v+','
# Remove the final comma and add an &
vector_string = vector_string[0:len(vector_string)-1]+'&'
# Create the period dates
start_date = 'startRefPeriod='+str(start_date)+'&'
end_date = 'endReferencePeriod='+str(end_date)
# Create the full url string
url_string = url+vector_string + start_date + end_date
# https://requests.readthedocs.io/en/latest/
result = requests.get(url_string)
string = result.json()
# add the list of json 'object's to return
# NOTE: Since I did not know how to implement the check_status of requests.get()
# I checked responseStatusCode and assumed a value of 0 shows no error
final_list += [r["object"] for r in string if r['object']['responseStatusCode']==0]
return final_list