fhir-works-on-aws-deployment icon indicating copy to clipboard operation
fhir-works-on-aws-deployment copied to clipboard

[Feature Request] Amazon OpenSearch Reindexing

Open Zambonilli opened this issue 4 years ago • 2 comments

Is your feature request related to a problem? Please describe. While running multiple fwoa environments we have experienced moments where our OpenSearch indexes have become corrupt and we need to delete them to get searching to work again. We switched from the json schema validator to the hapi validator when deploying our custom implementation guide. Fortunately, our misbehaving client was now getting 400 errors on requests with invalid data types in 1 of our resources fields. Unfortunately, we indexed a bunch of documents in OpenSearch with the invalid datatype for the field and we ended up having a conflict for the datatype in OpenSearch which resulted in searches failing for that resource.

We've also had transient failures in the DynamoDB to OpenSearch lambda to synchronize writes in DDB. These get successfully added to an SQS queue, error messages logged and cloudwatch alerts raised but there's no easy way to remediate these failures.

Describe the solution you'd like I see the following work to help mitigate some of the mapping and type conflict failures https://github.com/awslabs/fhir-works-on-aws-persistence-ddb/issues/18 and https://github.com/awslabs/fhir-works-on-aws-deployment/pull/474 but I'm not seeing a way to reindex a single DDB resource document, all DDB documents for a resource type or all DDB documents.

For Amazon OpenSearch index corruption due to type conflicts, there's going to have to be some manual intervention on the DDB side which probably can't be automated. There might be an opportunity for some sort of tooling that could help identify the index conflict on documents in DDB though to help reduce the time it takes to resolve the corrupt documents. Someone could then manually fix the DDB documents and reindex all documents in the index or the identified corrupt documents. Would codepipeline(s) for reindexing: all documents, all documents for a resource and specific documents by id make sense here? Is there a way to reindex while also minimizing any full search outage? With all of the fwoa code using index aliases, can the reindex process rebuild the index in the background and then reassign the alias to the rebuilt index once completed?

For transient failures in the DDB to OpenSearch lambda, the solution should lend itself to self healing long term. Ideally, the solution could leverage somehow rerunning the failed DDB id using some sort of exponentially backoff scheme so transient failures self heal and there's no manual intervention needed. Not sure if this dove tails with the solution to type corruption or not and I'm also not sure if there is an easy way to facilitate this with the DDB stream or SQS failure queue.

Describe alternatives you've considered I thought about leveraging the existing AWS Backup here but that doesn't address type conflicts nor transient failures but might help getting search back online faster than a full reindex from ddb.

Additional context We can help here on building the solution(s) and giving PR(s). We're interested in if the fwoa team already has solutions in mind here and if it's just waiting on resources to free up. If so, let us know and we might be able to do the elbow grease here.

Zambonilli avatar Nov 04 '21 14:11 Zambonilli

Hey @Zambonilli, thanks for bringing this up! We have recently implemented statically-typed mappings as you mentioned to help mitigate this issue, but there may still be cases where this doesn't work. We have added this request to our backlog!

ssvegaraju avatar Nov 15 '21 16:11 ssvegaraju

Thanks @ssvegaraju

Another use case we're hitting where this tooling would be used is for Disaster Recovery. Currently, AWS Backup does not support Amazon OpenSearch. There might be better, faster and cheaper ways to do DR natively as part of Amazon OpenSearch like using connections or multi-cluster but another path would be to reindex from dynamoDB which is supported by AWS Backup.

Zambonilli avatar Nov 18 '21 15:11 Zambonilli

FHIR Works on AWS has been moved to maintenance mode. While in maintenance, we will not add any new features to this solution. All security issues should be reported directly to AWS Security at [[email protected]] (mailto:[email protected]). If you are new to this solution, we advise you to explore using [HealthLake] (https://aws.amazon.com/healthlake), which is our managed service for building FHIR based transactional and analytics applications. You can get started by contacting your AWS Account team. If you are an existing customer of FHIR Works on AWS, and have additional questions or need immediate help, please reach out to [email protected] or contact your AWS Account team.

nisankep avatar Apr 03 '23 22:04 nisankep