hipBLASLt icon indicating copy to clipboard operation
hipBLASLt copied to clipboard

streamk - adding memory fallbacks

Open mahmoodw opened this issue 10 months ago • 8 comments

This PR adds in the fallback cases when not enough or null ptr workspace is provided.

Notes

  • correctly initialises inputs workspacesize
  • updates requiredWorkspaceSize helper for data parallel and persistent kernel cases
  • logic in develop changes since my original implementation - to discuss validity of newer logic.
  • using .githooks/pre-commit has introduced some additional formatting independent of my PR

mahmoodw avatar Feb 19 '25 03:02 mahmoodw

User should be able to know required workspace size from getHerursitic api. If the user given workspace size is not enough for real stream-K function, should we just return no solution? This is how hipblaslt API designs.

jichangjichang avatar Feb 26 '25 03:02 jichangjichang

User should be able to know required workspace size from getHerursitic api. If the user given workspace size is not enough for real stream-K function, should we just return no solution? This is how hipblaslt API designs.

Thanks for the feedback Jimmy. The team discussed the change and have an idea on how handle the change:

Running stream-k kernels without workspace (or with reduced workspace) could be a valid use-case, but it shouldn't be the "default" since it will have reduced performance. For example, I'm thinking of an HPC type application that requires large matrices, leaving no room for workspace. In this case we could run stream-k kernel in data-parallel mode without workspace. But the default user should pass in workspace for better performance.

We propose making a change so that if not enough workspace is provided, we return no solution found like you suggested. This would keep the default behavior to optimize performance. Then we can add an environment variable that will allow users to force it to run stream-k kernels with reduced workspace or no workspace. This way, no log output is needed since the user has to specifically opt-in to running it this way, and should be aware that performance may be affected.

If this sounds good to you, we'll go ahead and update the PR.

AlexBrownAMD avatar Mar 03 '25 18:03 AlexBrownAMD

@AlexBrownAMD @bethune-bryant @jichangjichang As advised I have made change to maintain API expectations. Now we will return error: NO solution found! When there is insufficient workspace for streamk kernels. I have also added the env variable TENSILE_STREAMK_DATA_PARALLEL, when set will override the memory requirements and force data parallel mode for streamk runs (no workspace requirement). This involved moving the workspaceCheck problem predicate to a separate task predicate as done in classic Tensile. It also consolidates the logic for workspace to a single existing ContractionSolution::requiredWorkspaceSize.

All reviews will be greatly appreciated. The functionality has been tested locally but would like to CI results and hear back from others to make sure there are no breaking changes.

mahmoodw avatar Mar 26 '25 23:03 mahmoodw

yes I tested a use case locally, with and without the DATA PARALLEL env variable. When there is insufficient workspace size and the env variable is not set we see the output below

return error: NO solution found!

mahmoodw avatar Mar 28 '25 23:03 mahmoodw

/AzurePipelines run

mahmoodw avatar Apr 08 '25 17:04 mahmoodw

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Apr 08 '25 17:04 azure-pipelines[bot]

@mahmoodw May I know why did you add taskPredicate? Any reason that we can't handle this fallback design by using original problemPredicate?

jichangjichang avatar Apr 09 '25 06:04 jichangjichang

@mahmoodw May I know why did you add taskPredicate? Any reason that we can't handle this fallback design by using original problemPredicate?

The already implemented requiredWorkspaceSize is not accessible from problemPredicate. These changes match classic Tensile style and avoid size logic being stored in multiple places for maintainability and redundancy

mahmoodw avatar Apr 09 '25 15:04 mahmoodw