data-engineering-zoomcamp icon indicating copy to clipboard operation
data-engineering-zoomcamp copied to clipboard

Data in week 1 is not available(yellow_tripdata_2021-01.csv)

Open toandaominh1997 opened this issue 1 year ago • 4 comments

I have tried requesting data from https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv, It seems not available now. Please update the data link from s3. Thanks

toandaominh1997 avatar Jul 17 '22 08:07 toandaominh1997

I also ran into this issue because the NYC TLC website keeps changing its links. Thankfully @alexeygrigorev recently backed up the data: https://github.com/DataTalksClub/nyc-tlc-data

The full path for the file you're asking about is https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yellow_tripdata_2021-01.csv.gz

joeeoj avatar Jul 18 '22 01:07 joeeoj

Thanks Mike!

alexeygrigorev avatar Jul 18 '22 06:07 alexeygrigorev

Should we use the new parquet file? https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet

There is a note in the readme, and I think the python code is updated to use parquet. I can PR the readme changes if so.

TheHollidayInn avatar Aug 02 '22 09:08 TheHollidayInn

Actually I thick it might be better to roll back the change (so it's consistent with the video) and use the back up csv files

alexeygrigorev avatar Aug 02 '22 10:08 alexeygrigorev