sql-runner Update docs

Update docs

Open dcrasto opened this issue 7 years ago • 8 comments

Currently using v0.5.2 I have a playbook with 2 steps. I can see the unloads finish on redshift, however the last step never executes and I get a timeout error on the first step. This just recently started happening and there is nothing running on the cluster that would cause it to block. If I run a 2nd or 3rd time it works just fine.

Is there a more verbose output?
What can I do differently?

I've even split up the queries into multiple steps and still no go. Thanks danny

error

‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/web-incremental.yml @ 2017-07-07-04-02
‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml @ 2017-07-07-04-02
2017/07/07 04:02:19 EXECUTING unload-kafka-to-s3 (in step unload-kafka-to-s3 @ dwh): /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql
2017/07/07 06:02:47 FAILURE: unload-kafka-to-s3 (step unload-kafka-to-s3 @ target dwh), ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer
2017/07/07 06:02:47
TARGET INITIALIZATION FAILURES:
QUERY FAILURES:
* Query unload-kafka-to-s3 /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql (in step unload-kafka-to-s3 @ target dwh), ERROR:
  - read tcp 172.35.5.39:50733->22.11.33.98:5439: read: connection reset by peer

playbook

targets:
  - :name:     __BI_DB_NAME__
    :type:     redshift
    :host:     __BI_DB_HOST__
    :database: __BI_DB_NAME__
    :port:     __BI_DB_PORT__
    :username: __BI_DB_USERNAME__
    :password: __BI_DB_PASSWORD__
:steps:
  - :name: unload-kafka-to-s3
    :queries:
      - :name: unload-kafka-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-to-s3.sql
        :template: true
  - :name: ad-bi-time-track
    :queries:
      - :name: ad-bi-time-track 
        :file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql

first step

UNLOAD ('
    with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
    SELECT listing_id, listing_key, event_type, event_unixtimestamp,
            event_date, event_datetime, listing_unixtimestamp,
            listing_date, listing_datetime, country, state, city,
            neighbourhood, category_l1, category_l2, category_l3,
            category_l4, language, user_id, user_name, geo_country,
            email, phone_number, live, seller_type, condition_id,
            price_type_id, price, currency_code, location_zip, ip,
            channel, images, title, featured_listing, first_ad, event_uuid,
            source, full_neighbourhood, listing_platform
    FROM  kafka.listings, last_time
    WHERE event_datetime > last_time  
          AND event_type = \'approved\'
    ORDER BY event_datetime
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/listings_' 
{{awsEnvCredentials}}
ESCAPE;

UNLOAD ('
    with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
    SELECT listing_id, event_unixtimestamp, event_uuid, partial, data_bag
    FROM  kafka.listings_databag, last_time
    WHERE (TIMESTAMP \'epoch\' + event_unixtimestamp * INTERVAL \'1 Second \') > last_time
    AND (event_uuid, listing_id) in (SELECT DISTINCT event_uuid, listing_id from kafka.listings where event_type = \'uae_listing_approved\')
    ORDER BY event_unixtimestamp
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/data_bag_' 
{{awsEnvCredentials}}
ESCAPE;

Second step

INSERT INTO snowplow_runner.time_track (
    SELECT MAX(event_datetime) as last_time from kafka.listings
);

Jul 07 '17 20:07 dcrasto

ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer

I think you might want to dig into this, I'm assuming this is redshift's ip?

Jul 07 '17 21:07 BenFradet

@BenFradet it looks like redshift does reset the connection, however it's because the script doesn't execute the second step and waits for lost response for the DB of the unload step that has finished as indicated on redshift's dashboard.

Jul 08 '17 05:07 dcrasto

@BenFradet Digging further, I have updated the playbook to run everything serially in it's own step however the second unload (databag) time's out. However I can see the data in the s3 bucket that corroborates the information on the redshift dashboard that the unload finishes correctly. I just don't seem to be getting a success from the second unload query. It just hangs till the timeout occurs and the job fails.

Here's the updated playbook.

:targets:
  - :name:     __BI_DB_NAME__
    :type:     redshift
    :host:     __BI_DB_HOST__
    :database: __BI_DB_NAME__
    :port:     __BI_DB_PORT__
    :username: __BI_DB_USERNAME__
    :password: __BI_DB_PASSWORD__
:steps:
  - :name: unload-kafka-to-s3-listings
    :queries:
      - :name: unload-kafka-listings-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-listings-to-s3.sql
        :template: true
  - :name: unload-kafka-to-s3-databag
    :queries:
      - :name: unload-kafka-databag-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-databag-to-s3.sql
        :template: true
  - :name: ad-bi-time-track
    :queries:
      - :name: ad-bi-time-track 
        :file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql

Jul 09 '17 10:07 dcrasto

Hey @dcrasto - I'm struggling to see how this could be a bug in SQL Runner... It's much more likely to be an issue either in Redshift, or in the Golang Postgres driver no?

Jul 09 '17 11:07 alexanderdean

seems to be the case @alexanderdean. I will check with the AWS guys to see whats going on.

Jul 09 '17 15:07 dcrasto

@dcrasto Any update on that ? I'm running into the same issue where a SQL query finishes correctly but then sql-runner hangs till the timeout occurs and the job fails.

Jul 17 '18 15:07 abrenaut

@abrenaut have to update the network settings on the machine running the job:

/sbin/sysctl -w net.ipv4.tcp_keepalive_time=200 net.ipv4.tcp_keepalive_intvl=200 net.ipv4.tcp_keepalive_probes=5

Also make sure there is no timeout for the user connecting to the DB.

Works like a charm now. Best

p.s @alexanderdean you might want to update the docs

Jul 17 '18 16:07 dcrasto

@mhadam do you mind taking care of updating the docs

Jul 17 '18 17:07 BenFradet

sql-runner sql-runner copied to clipboard

Update docs

error

playbook

first step

Second step

sql-runner
sql-runner copied to clipboard