RTX icon indicating copy to clipboard operation
RTX copied to clipboard

Move ARAX response databases and araxconfig.rtx.ai to us-east-1

Open saramsey opened this issue 2 years ago • 19 comments

Motivated by RTXteam/RTX issue 2093, and as suggested by @edeutsch, we out to consider moving the ARAX response S3 bucket, arax-responses.rtx.ai, and araxconfig.rtx.ai, to us-east-1. The rationale is that ITRB's EKS deployments are (I think) in us-east-1. We will need to carefully plan this out and ideally confirm which of these services are really the bottleneck, before doing the work to transition to a new AWS region.

saramsey avatar Aug 15 '23 19:08 saramsey

Update: the response database has been moved to us-east-1. See the lengthy thread in the #deployment channel in the ARAXTeam Slack workspace.

saramsey avatar Aug 28 '23 17:08 saramsey

I've created a new S3 bucket in us-east-1 called arax-response-storage-2 and copied everything from arax-response-storage to it. 165k objects, total of about half a terabyte. The copy took overnight. Does this help us to move closer to having the response bucket in us-east-1? @edeutsch Should we keep it or delete it until we are ready to migrate "for real"?

saramsey avatar Sep 06 '23 20:09 saramsey

To move araxconfig.rtx.ai to us-east-1, we may need to change the scp code in RTXConfiguration.py to use something like this:

scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no 

but maybe not, because @edeutsch reported that there is a way to do this by copying a sshd host key from the old server to the new server.

saramsey avatar Oct 13 '23 00:10 saramsey

https://superuser.com/questions/532040/copy-ssh-keys-from-one-server-to-another-server

edeutsch avatar Oct 13 '23 01:10 edeutsch

Thank you Eric! I like it.

saramsey avatar Oct 18 '23 16:10 saramsey

OK, I have set up a new araxconfig server araxconfig2.rtx.ai in the us-east-1 AWS region, and installed the SSH host keys from araxconfig.rtx.ai into the new server. Need to switch DNS and test if ssh works without a prompt to accept a new host key.

saramsey avatar Oct 20 '23 04:10 saramsey

OK, the DNS swap for araxconfig.rtx.ai and araxconfig2.rtx.ai is done. So now, araxconfig.rtx.ai points to 18.234.146.76, which is in us-east-1. Passwordless SCP from host [email protected] verified working from user rt inside the rtx1 container on arax.ncats.io and on other hosts. The new DNS entries are:

Screenshot 2023-10-20 at 11 14 39 AM

saramsey avatar Oct 20 '23 18:10 saramsey

For moving the response S3 bucket from us-west-2 to us-east-1, consensus from discussion with @edeutsch is that we will need to make some changes to response_cache.py. Here is the pseudocode (obviously the transition point will not be Oct. 20, but instead Eric has proposed Nov. 3):

Screenshot 2023-10-20 at 11 32 03 AM

saramsey avatar Oct 20 '23 18:10 saramsey

I've stopped araxconfig2.rtx.ai in us-east-2. But will keep it around for a while in case we need it.

saramsey avatar Oct 20 '23 19:10 saramsey

Both buckets exists.

  • west bucket has everything today: it is arax-response-storage

  • east bucket exists today but is currently empty: it is arax-response-storage-2

  • What if we were to encode the cutover date in the MySQL database table instead? yes, agreed

  • Set the code to perform the switchover on 2023-11-03 18:00 UTC (which is 11am PDT)

  • Start the copy of buckets over night the night before (still some time to compensate for failure)

  • the switchover happens

  • A few responses will still be stuck in the old but not new (easy because they are sequential)

  • After the cutover, need a little script that will check for responses that were created after the initial copy and need to also now be copied in an incremental way (Steve said easy)

  • This can be easily tested by testing on arax.ncats.io/test and fiddling with the MySQL date

edeutsch avatar Oct 25 '23 18:10 edeutsch

I have pushed some new code to master to address the S3 bucket migration: https://github.com/RTXteam/RTX/commit/7012efc0177efd98be645e10b117769dd5b07ea7

It seems to be working as best as I can test it. Comments and/or testing welcome.

The migration datetime can be changed with this:

python response_cache.py --set_config "S3BucketMigrationDatetime=2023-11-11 15:00:00"

Changing this value will affect all instances with the new code.

There is a little testing line that will force testing of just once instance here: https://github.com/RTXteam/RTX/blob/7012efc0177efd98be645e10b117769dd5b07ea7/code/ARAX/ResponseCache/response_cache.py#L842 which might be useful for testing

edeutsch avatar Oct 30 '23 21:10 edeutsch

One thing I haven't tested is writing to the new bucket. Maybe @saramsey and I can do that sometime when the script to copy individual files from bucket to bucket is ready, as this will be necessary to test nicely.

edeutsch avatar Oct 30 '23 21:10 edeutsch

This script is on the docket for this week.

saramsey avatar Nov 14 '23 22:11 saramsey

OK, I've added a script cp_trapi_resp_betw_s3_buckets.py, for the planned Feb. 2024 migration of the TRAPI response JSON files from us-west-2 to us-east-1.

saramsey avatar Nov 30 '23 17:11 saramsey

Hi everyone, just banging off the rust on this issue.. I have just merged our code in TEST into the production branch. If all goes according to prophecy, it will deployed to production sometime mid next week.

Then, rapture is currently scheduled as follows:

$ python3 response_cache.py --show_config
{
  "S3BucketMigrationDatetime": "2024-02-09 15:00:00"
}

At this time (UTC?), it is hoped that all systems will begin using the us-east-1 S3 bucket.

Copying of all responses from the old bucket to the new bucket and mop up of stragglers should be planned.

edeutsch avatar Feb 02 '24 18:02 edeutsch

The appointed time came and went, but instead of rapture we got: image

@saramsey can you think of a reason that our code would not be able to write to the us-east-1 bucket?

edeutsch avatar Feb 10 '24 06:02 edeutsch

Confirmed, this appears to be a S3 permissions issue with the AWS keypair that is specified in config_secrets.json.

saramsey avatar Feb 14 '24 19:02 saramsey

Should be fixed now.... I've fixed the IAM permissions for IAM user arax-response-storage. Having done that, I show below the result of a test using boto3 to write an object /responses/foo.json into the bucket s3://arax-response-storage-2 using the credentials from config_secrets.json (note the status code 200):

>>> s3 = boto3.resource('s3', region_name='us-east-1', aws_access_key_id='REDACTED', aws_secret_access_key='REDACTED')
>>> s3.Object('arax-response-storage-2', '/responses/foo.json').put(Body='[]')
{'ResponseMetadata': {'RequestId': 'SVC2QDBV9Z03J3VY', 'HostId': 'B1TknpI1oIdJImZSw9/ebmxfit9Xc23JvL32wZ5KsfP8sd2G1EC10wBCRxixcNM0DZC+a65DFQU=', 
'HTTPStatusCode': 200, 
'HTTPHeaders': {'x-amz-id-2': 'B1TknpI1oIdJImZSw9/ebmxfit9Xc23JvL32wZ5KsfP8sd2G1EC10wBCRxixcNM0DZC+a65DFQU=', 'x-amz-request-id': 'SVC2QDBV9Z03J3VY', 'date': 'Wed, 14 Feb 2024 19:37:13 GMT', 'x-amz-server-side-encryption': 'AES256', 'etag': '"d751713988987e9331980363e24189ce"', 'server': 'AmazonS3', 'content-length': '0'}, 'RetryAttempts': 0}, 'ETag': '"d751713988987e9331980363e24189ce"', 'ServerSideEncryption': 'AES256'}

saramsey avatar Feb 14 '24 19:02 saramsey

We have transitioned to using the new S3 bucket s3://arax-response-storage-2 as of about 2:55 PM PST today. Thanks @edeutsch for making the change. All files have been copied from the old bucket to the new bucket. If you notice anything missing (which, to be clear, would be unexpected at this point), please let me know.

saramsey avatar Feb 16 '24 23:02 saramsey

I think it was all taken care of, closing. Pleas reopen if you think not.

edeutsch avatar Jun 14 '24 03:06 edeutsch