ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-10702. Improve Recon startup failure handling and make it more resilient.

Open devmadhuu opened this issue 10 months ago • 4 comments

What changes were proposed in this pull request?

This PR is to address the recon initialisation issues due to SCM inherited code or other recon startup errors. This PR has introduced a new context variable known as ReconContext to hold information for recon health and other startup errors in various other recon modules. As part of this change, ReconContext is being used in ReconSCM flow initialisation and can be injected later for other modules as well. Information holding inside ReconContext can be used later to give meaningful message to user on Recon UI.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10702

How was this patch tested?

Tested manually by adding Junit test case.

devmadhuu avatar Apr 24 '24 12:04 devmadhuu

@dombizita @ArafatKhan2198 kindly review.

devmadhuu avatar Apr 25 '24 09:04 devmadhuu

@devmadhuu Thanks for working over this, unable to get usages of ReconContext. Here its just populating data but not used anywhere except test code. Further its just supressing the error being stored in context, but not making recon failure.

This PR objective is just to store data in ReconContext and will be used later in another PR where an API will expose the meaningful info to UI. We don't want Recon start up to fail, rather we want to expose what information is not available at Recon because of what reason.

devmadhuu avatar May 02 '24 06:05 devmadhuu

@devmadhuu Given few minor comment for this. This implementation is just avoid InvalidTopologyException. So next PR will show this information over UI or alert about the issue as health report, right?

Yes, currently this PR is just to provide a way for Recon to show meaningful information over UI for failures. This PR is handling InvalidTopologyException, but later ReconContext can hold error or failure information for other types of failures as well which can be used to show over Recon UI.

devmadhuu avatar May 08 '24 18:05 devmadhuu

Thanks for updating the patch @devmadhuu the changes look good! Could you please take a look at the failing Tests in your fork :- https://github.com/devmadhuu/ozone/actions/runs/9015109706/job/24769697868

ArafatKhan2198 avatar May 09 '24 14:05 ArafatKhan2198