amazon-redshift-utils icon indicating copy to clipboard operation
amazon-redshift-utils copied to clipboard

ra3-migration-replay fails to set target endpoint and region

Open ctcudd opened this issue 3 years ago • 1 comments

Encountering multiple errors when running the ra3-migration-replay:

1. Replacement for target_cluster_endpoint is failing:

Error:[INFO] 2022-02-16 14:50:25 Loading config file from ./2022-02-16-13-52-33/replay_target.yaml
[INFO] 2022-02-16 14:50:25 Saving SimpleReplay logs to simplereplay_logs
[INFO] 2022-02-16 14:50:25 Logging to simplereplay_logs/replay.log
[INFO] 2022-02-16 14:50:25 Version 2.2
Traceback (most recent call last):
  File "replay.py", line 1856, in <module>
    main()
  File "replay.py", line 1615, in main
    cluster = cluster_dict(g_config["target_cluster_endpoint"])
  File "/amazonutils/amazon-redshift-utils/src/SimpleReplay/util.py", line 175, in cluster_dict
    "region": url_split[2],
IndexError: list index out of range
failed to run commands: exit status 1

The problem is that this line does NOT actually perform a replacment: sed -i "s#target_cluster_endpoint: \"\"#target_cluster_endpoint: \"$cluster_endpoint\"#g" ./$bucket_keyprefix/replay_target.yaml

Because the default value for target_cluster_endpoint was recently changed from "" to "host:port/database"

Fix:

sed -i "s#target_cluster_endpoint: \"host:port/database\"#target_cluster_endpoint: \"$cluster_endpoint\"#g" ./$bucket_keyprefix/replay_target.yaml

2. target_cluster_region defaults to an empty string, causing: ValueError: Invalid endpoint: https://redshift..amazonaws.com

Why was this variable introduced? It would be better to parse the region from target_cluster_endpoint using same logic as seen in util.py

Full error:

Traceback (most recent call last):
File "replay.py", line 1856, in <module>
main()
File "replay.py", line 1745, in main
get_connection_credentials(connection_logs[0].username, database=connection_logs[0].database_name, max_attempts=1)
File "replay.py", line 1338, in get_connection_credentials
rs_client = client("redshift", region_name=g_config.get("target_cluster_region", None), **additional_args)
File "/usr/local/lib/python3.7/site-packages/boto3/__init__.py", line 93, in client
return _get_default_session().client(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/boto3/session.py", line 275, in client
aws_session_token=aws_session_token, config=config)
File "/usr/local/lib/python3.7/site-packages/botocore/session.py", line 874, in create_client
client_config=config, api_version=api_version)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 93, in create_client
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 362, in _get_client_args
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/usr/local/lib/python3.7/site-packages/botocore/args.py", line 108, in get_client_args
proxies_config=new_config.proxies_config)
File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 335, in create_endpoint
raise ValueError("Invalid endpoint: %s" % endpoint_url)
ValueError: Invalid endpoint: https://redshift..amazonaws.com
failed to run commands: exit status 1

ctcudd avatar Feb 23 '22 15:02 ctcudd

I am facing the exact same issue. I did a few changes as a temporary workaround. The replay was supposed to end in 5 hours but even after 8 hours it is still in the loop state. Checked the CPU utilization graph for both replica and target cluster, the queries are executing.

tanveer941 avatar Feb 24 '22 14:02 tanveer941