data-engineering-zoomcamp
data-engineering-zoomcamp copied to clipboard
Data in week 1 is not available(yellow_tripdata_2021-01.csv)
I have tried requesting data from https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv, It seems not available now. Please update the data link from s3. Thanks
I also ran into this issue because the NYC TLC website keeps changing its links. Thankfully @alexeygrigorev recently backed up the data: https://github.com/DataTalksClub/nyc-tlc-data
The full path for the file you're asking about is https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz
Thanks Mike!
Should we use the new parquet file? https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
There is a note in the readme, and I think the python code is updated to use parquet. I can PR the readme changes if so.
Actually I thick it might be better to roll back the change (so it's consistent with the video) and use the back up csv files