sql-runner
sql-runner copied to clipboard
Update docs
Currently using v0.5.2 I have a playbook with 2 steps. I can see the unloads finish on redshift, however the last step never executes and I get a timeout error on the first step. This just recently started happening and there is nothing running on the cluster that would cause it to block. If I run a 2nd or 3rd time it works just fine.
- Is there a more verbose output?
- What can I do differently?
I've even split up the queries into multiple steps and still no go. Thanks danny
error
‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/web-incremental.yml @ 2017-07-07-04-02
‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml @ 2017-07-07-04-02
2017/07/07 04:02:19 EXECUTING unload-kafka-to-s3 (in step unload-kafka-to-s3 @ dwh): /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql
2017/07/07 06:02:47 FAILURE: unload-kafka-to-s3 (step unload-kafka-to-s3 @ target dwh), ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer
2017/07/07 06:02:47
TARGET INITIALIZATION FAILURES:
QUERY FAILURES:
* Query unload-kafka-to-s3 /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql (in step unload-kafka-to-s3 @ target dwh), ERROR:
- read tcp 172.35.5.39:50733->22.11.33.98:5439: read: connection reset by peer
playbook
targets:
- :name: __BI_DB_NAME__
:type: redshift
:host: __BI_DB_HOST__
:database: __BI_DB_NAME__
:port: __BI_DB_PORT__
:username: __BI_DB_USERNAME__
:password: __BI_DB_PASSWORD__
:steps:
- :name: unload-kafka-to-s3
:queries:
- :name: unload-kafka-to-s3
:file: sql/bi-kafka/incremental/unload-kafka-to-s3.sql
:template: true
- :name: ad-bi-time-track
:queries:
- :name: ad-bi-time-track
:file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql
first step
UNLOAD ('
with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
SELECT listing_id, listing_key, event_type, event_unixtimestamp,
event_date, event_datetime, listing_unixtimestamp,
listing_date, listing_datetime, country, state, city,
neighbourhood, category_l1, category_l2, category_l3,
category_l4, language, user_id, user_name, geo_country,
email, phone_number, live, seller_type, condition_id,
price_type_id, price, currency_code, location_zip, ip,
channel, images, title, featured_listing, first_ad, event_uuid,
source, full_neighbourhood, listing_platform
FROM kafka.listings, last_time
WHERE event_datetime > last_time
AND event_type = \'approved\'
ORDER BY event_datetime
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/listings_'
{{awsEnvCredentials}}
ESCAPE;
UNLOAD ('
with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
SELECT listing_id, event_unixtimestamp, event_uuid, partial, data_bag
FROM kafka.listings_databag, last_time
WHERE (TIMESTAMP \'epoch\' + event_unixtimestamp * INTERVAL \'1 Second \') > last_time
AND (event_uuid, listing_id) in (SELECT DISTINCT event_uuid, listing_id from kafka.listings where event_type = \'uae_listing_approved\')
ORDER BY event_unixtimestamp
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/data_bag_'
{{awsEnvCredentials}}
ESCAPE;
Second step
INSERT INTO snowplow_runner.time_track (
SELECT MAX(event_datetime) as last_time from kafka.listings
);
ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer
I think you might want to dig into this, I'm assuming this is redshift's ip?
@BenFradet it looks like redshift does reset the connection, however it's because the script doesn't execute the second step and waits for lost response for the DB of the unload step that has finished as indicated on redshift's dashboard.
@BenFradet Digging further, I have updated the playbook to run everything serially in it's own step however the second unload (databag) time's out. However I can see the data in the s3 bucket that corroborates the information on the redshift dashboard that the unload finishes correctly. I just don't seem to be getting a success from the second unload query. It just hangs till the timeout occurs and the job fails.
Here's the updated playbook.
:targets:
- :name: __BI_DB_NAME__
:type: redshift
:host: __BI_DB_HOST__
:database: __BI_DB_NAME__
:port: __BI_DB_PORT__
:username: __BI_DB_USERNAME__
:password: __BI_DB_PASSWORD__
:steps:
- :name: unload-kafka-to-s3-listings
:queries:
- :name: unload-kafka-listings-to-s3
:file: sql/bi-kafka/incremental/unload-kafka-listings-to-s3.sql
:template: true
- :name: unload-kafka-to-s3-databag
:queries:
- :name: unload-kafka-databag-to-s3
:file: sql/bi-kafka/incremental/unload-kafka-databag-to-s3.sql
:template: true
- :name: ad-bi-time-track
:queries:
- :name: ad-bi-time-track
:file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql
Hey @dcrasto - I'm struggling to see how this could be a bug in SQL Runner... It's much more likely to be an issue either in Redshift, or in the Golang Postgres driver no?
seems to be the case @alexanderdean. I will check with the AWS guys to see whats going on.
@dcrasto Any update on that ? I'm running into the same issue where a SQL query finishes correctly but then sql-runner hangs till the timeout occurs and the job fails.
@abrenaut have to update the network settings on the machine running the job:
/sbin/sysctl -w net.ipv4.tcp_keepalive_time=200 net.ipv4.tcp_keepalive_intvl=200 net.ipv4.tcp_keepalive_probes=5
Also make sure there is no timeout for the user connecting to the DB.
Works like a charm now. Best
p.s @alexanderdean you might want to update the docs
@mhadam do you mind taking care of updating the docs