data-science-on-gcp
data-science-on-gcp copied to clipboard
README improvement for chapters 2 and 3 regarding upload to BQ
Hello,
Excellent book so far, but a problem I've been having is uploading the 2015 CSVs from my cloud storage bucket to BigQuery.
Both the ch2 and ch3 READMEs just tell you to run:
cd data-science-on-gcp/02_ingest
./ingest_from_crsbucket.sh bucketname
But this only copies the CSVs from the book's bucket to the user's. It doesn't cover the next stage i.e. uploading to BQ.
The alternative route of ingesting from the original source of data also doesn't work: I found that my Google Cloud Shell kept disconnecting halfway through the upload process.
Therefore I'd recommend adding the following instruction to both READMEs, showing you explicitly how to do the upload to BQ:
bash bqload.sh bucketname 2015
thanks, I've put in a pull request to make the change. Instead of using ./ingest_from_crsbucket.sh, simply using ./ingest.sh will do the trick as it also uploads to BigQuery.
That approach didn't work for me either - my Cloud Shell would disconnect halfway through the upload to BQ so I would end up with an incomplete table. Solution was simply to run bash bqload.sh bucketname 2015
.
Other people may not be so unfortunate though!
Struggling for almost a day now trying to load to BigQuery without luck... used the bqload.sh with the correct params but getting the "Not found: URI gs://srini-laks-gcp1-dsongcp" error.
Enjoyed reading the two chapters but surprised to see the "user-unfriendliness" of this GCP platform. It shouldn't;t have to take all this time, given the data available through a Google search, but it does! Frustrating, to say the least.
Struggling for almost a day now trying to load to BigQuery without luck... used the bqload.sh with the correct params but getting the "Not found: URI gs://srini-laks-gcp1-dsongcp" error.
Enjoyed reading the two chapters but surprised to see the "user-unfriendliness" of this GCP platform. It shouldn't;t have to take all this time, given the data available through a Google search, but it does! Frustrating, to say the least.
Got it to work finally... Page 49 changes:
- Navigate into the flights 02_ingest folder
- cd data-science-on-gcp/02_ingest
- Run the code to download the files:
- for MONTH....
- bash ../download.sh 2015 $MONTH