Move ARAX response databases and araxconfig.rtx.ai to us-east-1
Motivated by RTXteam/RTX issue 2093, and as suggested by @edeutsch, we out to consider moving the ARAX response S3 bucket, arax-responses.rtx.ai, and araxconfig.rtx.ai, to us-east-1. The rationale is that ITRB's EKS deployments are (I think) in us-east-1. We will need to carefully plan this out and ideally confirm which of these services are really the bottleneck, before doing the work to transition to a new AWS region.
Update: the response database has been moved to us-east-1. See the lengthy thread in the #deployment channel in the ARAXTeam Slack workspace.
I've created a new S3 bucket in us-east-1 called arax-response-storage-2 and copied everything from arax-response-storage to it. 165k objects, total of about half a terabyte. The copy took overnight. Does this help us to move closer to having the response bucket in us-east-1? @edeutsch Should we keep it or delete it until we are ready to migrate "for real"?
To move araxconfig.rtx.ai to us-east-1, we may need to change the scp code in RTXConfiguration.py to use something like this:
scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
but maybe not, because @edeutsch reported that there is a way to do this by copying a sshd host key from the old server to the new server.
https://superuser.com/questions/532040/copy-ssh-keys-from-one-server-to-another-server
Thank you Eric! I like it.
OK, I have set up a new araxconfig server araxconfig2.rtx.ai in the us-east-1 AWS region, and installed the SSH host keys from araxconfig.rtx.ai into the new server. Need to switch DNS and test if ssh works without a prompt to accept a new host key.
OK, the DNS swap for araxconfig.rtx.ai and araxconfig2.rtx.ai is done. So now, araxconfig.rtx.ai points to 18.234.146.76, which is in us-east-1. Passwordless SCP from host [email protected] verified working from user rt inside the rtx1 container on arax.ncats.io and on other hosts. The new DNS entries are:
For moving the response S3 bucket from us-west-2 to us-east-1, consensus from discussion with @edeutsch is that we will need to make some changes to response_cache.py. Here is the pseudocode (obviously the transition point will not be Oct. 20, but instead Eric has proposed Nov. 3):
I've stopped araxconfig2.rtx.ai in us-east-2. But will keep it around for a while in case we need it.
Both buckets exists.
-
west bucket has everything today: it is arax-response-storage
-
east bucket exists today but is currently empty: it is arax-response-storage-2
-
What if we were to encode the cutover date in the MySQL database table instead? yes, agreed
-
Set the code to perform the switchover on 2023-11-03 18:00 UTC (which is 11am PDT)
-
Start the copy of buckets over night the night before (still some time to compensate for failure)
-
the switchover happens
-
A few responses will still be stuck in the old but not new (easy because they are sequential)
-
After the cutover, need a little script that will check for responses that were created after the initial copy and need to also now be copied in an incremental way (Steve said easy)
-
This can be easily tested by testing on arax.ncats.io/test and fiddling with the MySQL date
I have pushed some new code to master to address the S3 bucket migration: https://github.com/RTXteam/RTX/commit/7012efc0177efd98be645e10b117769dd5b07ea7
It seems to be working as best as I can test it. Comments and/or testing welcome.
The migration datetime can be changed with this:
python response_cache.py --set_config "S3BucketMigrationDatetime=2023-11-11 15:00:00"
Changing this value will affect all instances with the new code.
There is a little testing line that will force testing of just once instance here: https://github.com/RTXteam/RTX/blob/7012efc0177efd98be645e10b117769dd5b07ea7/code/ARAX/ResponseCache/response_cache.py#L842 which might be useful for testing
One thing I haven't tested is writing to the new bucket. Maybe @saramsey and I can do that sometime when the script to copy individual files from bucket to bucket is ready, as this will be necessary to test nicely.
This script is on the docket for this week.
OK, I've added a script cp_trapi_resp_betw_s3_buckets.py, for the planned Feb. 2024 migration of the TRAPI response JSON files from us-west-2 to us-east-1.
Hi everyone, just banging off the rust on this issue.. I have just merged our code in TEST into the production branch. If all goes according to prophecy, it will deployed to production sometime mid next week.
Then, rapture is currently scheduled as follows:
$ python3 response_cache.py --show_config
{
"S3BucketMigrationDatetime": "2024-02-09 15:00:00"
}
At this time (UTC?), it is hoped that all systems will begin using the us-east-1 S3 bucket.
Copying of all responses from the old bucket to the new bucket and mop up of stragglers should be planned.
The appointed time came and went, but instead of rapture we got:
@saramsey can you think of a reason that our code would not be able to write to the us-east-1 bucket?
Confirmed, this appears to be a S3 permissions issue with the AWS keypair that is specified in config_secrets.json.
Should be fixed now.... I've fixed the IAM permissions for IAM user arax-response-storage. Having done that, I show below the result of a test using boto3 to write an object /responses/foo.json into the bucket s3://arax-response-storage-2 using the credentials from config_secrets.json (note the status code 200):
>>> s3 = boto3.resource('s3', region_name='us-east-1', aws_access_key_id='REDACTED', aws_secret_access_key='REDACTED')
>>> s3.Object('arax-response-storage-2', '/responses/foo.json').put(Body='[]')
{'ResponseMetadata': {'RequestId': 'SVC2QDBV9Z03J3VY', 'HostId': 'B1TknpI1oIdJImZSw9/ebmxfit9Xc23JvL32wZ5KsfP8sd2G1EC10wBCRxixcNM0DZC+a65DFQU=',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amz-id-2': 'B1TknpI1oIdJImZSw9/ebmxfit9Xc23JvL32wZ5KsfP8sd2G1EC10wBCRxixcNM0DZC+a65DFQU=', 'x-amz-request-id': 'SVC2QDBV9Z03J3VY', 'date': 'Wed, 14 Feb 2024 19:37:13 GMT', 'x-amz-server-side-encryption': 'AES256', 'etag': '"d751713988987e9331980363e24189ce"', 'server': 'AmazonS3', 'content-length': '0'}, 'RetryAttempts': 0}, 'ETag': '"d751713988987e9331980363e24189ce"', 'ServerSideEncryption': 'AES256'}
We have transitioned to using the new S3 bucket s3://arax-response-storage-2 as of about 2:55 PM PST today. Thanks @edeutsch for making the change. All files have been copied from the old bucket to the new bucket. If you notice anything missing (which, to be clear, would be unexpected at this point), please let me know.
I think it was all taken care of, closing. Pleas reopen if you think not.