John Pennycook
John Pennycook
Thanks for looking at this, @AlexeySachkov. `-fno-sycl-use-footer` helps a little bit, but still produces some errors. ``` llvm-cov show ./exe -instr-profile=default.profdata main.cpp: 1| |#include 2| | 3| |int main(int argc,...
> That's interesting. If I understand correctly, we shouldn't invoke the kernel lambda at host at all. Sorry, that's what I meant -- the empty lambda body is *not* executed....
> BTW, I've just realized that if you try to compile with `--save-temps`, then perhaps you should be able to see no errors with both integration footer and header. But...
> I also tried --save-temps and looked at the report. The report is huge. But the expected 8 lines are at the very end of the report and can be...
I built a small proof-of-concept today that dumps the coverage files for the HACC case-study, just to see how big the cache would be. It's about 11 MB (uncompressed), and...
After #122 is merged, `parse_file` accounts for ~20% of execution time in my offline stress test.
Recent experience (see #144) suggests that it may in fact be preferable to merge the cleaning step into the preprocessor. Our current two-step approach destroys physical line information, because tokenization...
> I have never developed compiled code on Windows - do we have a sense of what is feasible on Windows in terms of the general tooling? I've never developed...
> The sycl_ext_oneapi_launch_queries extension already has queries named `max_work_group_size` and `max_num_work_groups`, so it would be very natural to add queries named `recommended_work_group_size` and `recommended_num_work_groups`. Tagging @Pennycook here also for his...
> Additionally, the CUTLASS port in our [codeplaysoftware/cutlass-fork](https://github.com/codeplaysoftware/cutlass-fork) does require a query with semantics returing the number of work-groups per compute-unit, so while `recommended_num_work_groups` can have device/cross-work-group semantics, we'd still...