data-engineering-zoomcamp
data-engineering-zoomcamp copied to clipboard
URL in Readme on week_1_basics_n_setup/2_docker_sql needs changed to .parquet
The README.md under Data ingestion -> Running locally is: URL="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv"
It should be URL="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet"
PRs are welcome
I submitted #198 to hopefully address it
@alexeygrigorev please take a look 👀
The README and ingest_data.py script were already fixed with #192. My PR is just to update the Jupyter notebook. But now that I look at it, I should probably change my code to be closer to #192 so the notebook and ingest script are more consistent with each other.
Yes, if possible. I'm also having seconds thoughts about the ingest_data.py code change - right now it quite different to what was presented in the video and might cause some confusion...
I also found out that the csv files are still there:
$ aws s3 ls s3://nyc-tlc
PRE csv_backup/
PRE misc/
PRE trip data/
So it should be possible to use them
Yup I see it here -- https://s3.amazonaws.com/nyc-tlc/csv_backup/yellow_tripdata_2021-01.csv
Since #192 already added that link to the README I think this can safely be closed. Following that new link should allow you to proceed with the tutorial as normal.