HPCC-Platform
HPCC-Platform copied to clipboard
HPCC-28939 Increase timeout duration of component launch in hpcc-init
Signed-off-by: Dan S. Camper [email protected]
Type of change:
- [x] This change is a bug fix (non-breaking change which fixes an issue).
- [ ] This change is a new feature (non-breaking change which adds functionality).
- [ ] This change improves the code (refactor or other change that does not change the functionality)
- [ ] This change fixes warnings (the fix does not alter the functionality or the generated code)
- [ ] This change is a breaking change (fix or feature that will cause existing behavior to change).
- [ ] This change alters the query API (existing queries will have to be recompiled)
Checklist:
- [x] My code follows the code style of this project.
- [x] My code does not create any new warnings from compiler, build system, or lint.
- [x] The commit message is properly formatted and free of typos.
- [x] The commit message title makes sense in a changelog, by itself.
- [x] The commit is signed.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly, or...
- [ ] I have created a JIRA ticket to update the documentation.
- [ ] Any new interfaces or exported functions are appropriately commented.
- [ ] I have read the CONTRIBUTORS document.
- [ ] The change has been fully tested:
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
- [ ] I have checked that this change does not introduce memory leaks.
- [ ] I have used Valgrind or similar tools to check for potential issues.
- [ ] I have given due consideration to all of the following potential concerns:
- [ ] Scalability
- [ ] Performance
- [ ] Security
- [ ] Thread-safety
- [ ] Cloud-compatibility
- [ ] Premature optimization
- [ ] Existing deployed queries will not be broken
- [ ] This change fixes the problem, not just the symptom
- [ ] The target branch of this pull request is appropriate for such a change.
- [ ] There are no similar instances of the same problem that should be addressed
- [ ] I have addressed them here
- [ ] I have raised JIRA issues to address them separately
- [ ] This is a user interface / front-end modification
- [ ] I have tested my changes in multiple modern browsers
- [ ] The component(s) render as expected
Smoketest:
- [ ] Send notifications about my Pull Request position in Smoketest queue.
- [ ] Test my draft Pull Request.
Testing:
https://track.hpccsystems.com/browse/HPCC-28939 Jira updated
FYI @jakesmith
@Michael-Gardner @dcamper - 10 minutes might not be enough.. (e.g. it starting up cold with a huge transaction log..) Perhaps we should ensure Dali does start up before proceeding and trace messages that still starting or similar, as it's critical it is working for almost all other components,
@jakesmith I agree with your comments, but I think it depends on what the purpose (or goal) of this change is. What I was aiming for was just a "mitigation" rather than an outright solution, reducing the number of times someone would see the TIMEOUT indicator and temporarily-alarming subsequent messages. This change does not outright remove that behavior, just reduces its occurrence.
If the goal should be to make a clean and accurate representation of what is going on during the launch cycle then we may need quite a bit more scaffolding to monitor Dali.
@jakesmith I agree with your comments, but I think it depends on what the purpose (or goal) of this change is. What I was aiming for was just a "mitigation" rather than an outright solution, reducing the number of times someone would see the TIMEOUT indicator and temporarily-alarming subsequent messages. This change does not outright remove that behavior, just reduces its occurrence.
If the goal should be to make a clean and accurate representation of what is going on during the launch cycle then we may need quite a bit more scaffolding to monitor Dali.
it should probably be both. This one is okay as a stop-gap, to mitigate the issue for those with large stores (or cold starts with large transaction histories), but we also need a JIRA for a more formal improvement as described above I think.
This is specific to bare-metal environments, correct? Just trying to determine priorities.
This is specific to bare-metal environments, correct? Just trying to determine priorities.
yep.
@Michael-Gardner Jake mentioned that this older PR may still be needed, but made Dali-specific. Please review. Thanks!
@Michael-Gardner I tested it ages ago and it worked, but I do not remember the details. It would be worthwhile for you to independently test, I think. Thanks!
@ghalliday It's working fine for me. Lets merge.
@dcamper please squash and I will merge.
@ghalliday Squashed and ready for merge. Thanks!