HPCC-Platform icon indicating copy to clipboard operation
HPCC-Platform copied to clipboard

HPCC-28757 New StSizePeakEphemeral and StSizePeakSubgraphTemp for hash dedup

Open shamser opened this issue 10 months ago • 5 comments

Type of change:

  • [ ] This change is a bug fix (non-breaking change which fixes an issue).
  • [x] This change is a new feature (non-breaking change which adds functionality).
  • [ ] This change improves the code (refactor or other change that does not change the functionality)
  • [ ] This change fixes warnings (the fix does not alter the functionality or the generated code)
  • [ ] This change is a breaking change (fix or feature that will cause existing behavior to change).
  • [ ] This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • [x] My code follows the code style of this project.
    • [x] My code does not create any new warnings from compiler, build system, or lint.
  • [x] The commit message is properly formatted and free of typos.
    • [ ] The commit message title makes sense in a changelog, by itself.
    • [ ] The commit is signed.
  • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly, or...
    • [ ] I have created a JIRA ticket to update the documentation.
    • [ ] Any new interfaces or exported functions are appropriately commented.
  • [x] I have read the CONTRIBUTORS document.
  • [x] The change has been fully tested:
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    • [ ] I have checked that this change does not introduce memory leaks.
    • [ ] I have used Valgrind or similar tools to check for potential issues.
  • [ ] I have given due consideration to all of the following potential concerns:
    • [ ] Scalability
    • [ ] Performance
    • [ ] Security
    • [ ] Thread-safety
    • [ ] Cloud-compatibility
    • [ ] Premature optimization
    • [ ] Existing deployed queries will not be broken
    • [ ] This change fixes the problem, not just the symptom
    • [ ] The target branch of this pull request is appropriate for such a change.
  • [ ] There are no similar instances of the same problem that should be addressed
    • [ ] I have addressed them here
    • [ ] I have raised JIRA issues to address them separately
  • [ ] This is a user interface / front-end modification
    • [ ] I have tested my changes in multiple modern browsers
    • [ ] The component(s) render as expected

Smoketest:

  • [x] Send notifications about my Pull Request position in Smoketest queue.
  • [ ] Test my draft Pull Request.

Testing:

shamser avatar Apr 03 '24 10:04 shamser

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-28757

Jirabot Action Result: Workflow Transition: Merge Pending Updated PR

github-actions[bot] avatar Apr 03 '24 10:04 github-actions[bot]

@shamser - looks fine, but see comment re. offset_t's in CFileSizeTracker

Sorry, forgot to push the last batch of changes.

shamser avatar Apr 25 '24 15:04 shamser

Re. naming, I'd go renaming the 3 stats to:

StSizePeakInterSubgraphDisk StSizePeakInternalTempDisk StSizePeakEphemeralDisk

@ghalliday ? And what do StSizeSpillFile and StSizeGraphSpill become? (I'm not quite sure what they currently represent).

You want them to be consistent with the other names. Some more suggestions

StSizeInterSubgraphSpill[Write?] StSizePeakInterSubgraphSpill StSizePeakInternalTemp - peak size StSizeInternalTemp - current size StSizeInternalTempWrite - total size StSizePeakEphemeralDiskUsage

It is picky, but I think it will help us to have some clear terminology for the different aspects.

ghalliday avatar May 01 '24 08:05 ghalliday

StSizeInternalTemp

@ghalliday If I have understood correctly, shouldn't StSizeInternalTemp be zero at the end of each activity?

shamser avatar May 01 '24 09:05 shamser

StSizeInternalTemp

@ghalliday If I have understood correctly, shouldn't StSizeInternalTemp be zero at the end of each activity?

Yes. I'm not sure it is useful - but as Jake says, it might be useful as the job is running. I suspect as long as you have the peak the active doesn't provide much value.

ghalliday avatar May 01 '24 11:05 ghalliday

@shamser - please squash.

jakesmith avatar May 09 '24 15:05 jakesmith