shunting314
shunting314
> I'd suggest doing all codegen in the parent process, then just sending a filename+sizes/strides/offsets/dtypes to the subprocess. For extern kernels, you can replace the filename with the call_name. I...
I'm thinking it would be easier to let this PR handle TritonTemplateCaller only for now since 1. this makes it simpler and we can always do the same thing for...
One thing I realized but have not done in this PR is, right now single process autotuning leverage lambdas to represent the tuning tasks while multi process autotuning leverages the...
@shunting314 has imported this pull request. If you are a Meta employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D43996048).
@shunting314 has imported this pull request. If you are a Meta employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D43996048).
@shunting314 has imported this pull request. If you are a Meta employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D43996048).
@jansel I've update the PR to use BenchmarkRequest for single process case as well. Please take another look, thanks!
@shunting314 has imported this pull request. If you are a Meta employee, you can view this diff [on Phabricator](https://www.internalfb.com/diff/D43996048).
@pytorchbot merge
For lint issues, you can also test that locally: ``` pip install lintrunner # if lintrunner is not installed yet lintrunner -a ```