data-engineering-zoomcamp icon indicating copy to clipboard operation
data-engineering-zoomcamp copied to clipboard

URL in Readme on week_1_basics_n_setup/2_docker_sql needs changed to .parquet

Open kyleaddis opened this issue 2 years ago • 6 comments

The README.md under Data ingestion -> Running locally is: URL="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.csv"

It should be URL="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet"

kyleaddis avatar May 28 '22 22:05 kyleaddis

PRs are welcome

alexeygrigorev avatar May 30 '22 16:05 alexeygrigorev

I submitted #198 to hopefully address it

joeeoj avatar Jun 15 '22 01:06 joeeoj

@alexeygrigorev please take a look 👀

iamtodor avatar Jun 21 '22 20:06 iamtodor

The README and ingest_data.py script were already fixed with #192. My PR is just to update the Jupyter notebook. But now that I look at it, I should probably change my code to be closer to #192 so the notebook and ingest script are more consistent with each other.

joeeoj avatar Jun 21 '22 22:06 joeeoj

Yes, if possible. I'm also having seconds thoughts about the ingest_data.py code change - right now it quite different to what was presented in the video and might cause some confusion...

I also found out that the csv files are still there:

$ aws s3 ls s3://nyc-tlc
                           PRE csv_backup/
                           PRE misc/
                           PRE trip data/

So it should be possible to use them

alexeygrigorev avatar Jun 22 '22 04:06 alexeygrigorev

Yup I see it here -- https://s3.amazonaws.com/nyc-tlc/csv_backup/yellow_tripdata_2021-01.csv

Since #192 already added that link to the README I think this can safely be closed. Following that new link should allow you to proceed with the tutorial as normal.

joeeoj avatar Jun 22 '22 16:06 joeeoj