John Pennycook

Results 31 comments of John Pennycook

Thanks for looking at this, @AlexeySachkov. `-fno-sycl-use-footer` helps a little bit, but still produces some errors. ``` llvm-cov show ./exe -instr-profile=default.profdata main.cpp: 1| |#include 2| | 3| |int main(int argc,...

> That's interesting. If I understand correctly, we shouldn't invoke the kernel lambda at host at all. Sorry, that's what I meant -- the empty lambda body is *not* executed....

> BTW, I've just realized that if you try to compile with `--save-temps`, then perhaps you should be able to see no errors with both integration footer and header. But...

> I also tried --save-temps and looked at the report. The report is huge. But the expected 8 lines are at the very end of the report and can be...

I built a small proof-of-concept today that dumps the coverage files for the HACC case-study, just to see how big the cache would be. It's about 11 MB (uncompressed), and...

After #122 is merged, `parse_file` accounts for ~20% of execution time in my offline stress test.

Recent experience (see #144) suggests that it may in fact be preferable to merge the cleaning step into the preprocessor. Our current two-step approach destroys physical line information, because tokenization...

> I have never developed compiled code on Windows - do we have a sense of what is feasible on Windows in terms of the general tooling? I've never developed...

> The sycl_ext_oneapi_launch_queries extension already has queries named `max_work_group_size` and `max_num_work_groups`, so it would be very natural to add queries named `recommended_work_group_size` and `recommended_num_work_groups`. Tagging @Pennycook here also for his...

> Additionally, the CUTLASS port in our [codeplaysoftware/cutlass-fork](https://github.com/codeplaysoftware/cutlass-fork) does require a query with semantics returing the number of work-groups per compute-unit, so while `recommended_num_work_groups` can have device/cross-work-group semantics, we'd still...