Larry Meadows comments

Results 11 comments of


                                            Larry Meadows

add nd_range to SYCL

I'm late to the party but I don't see any significant difference between SYCL and CUDA scores on A100 once you accept the PR I just submitted :) . SYCL...

Additional SYCL USM (device pointer explicit copy) and CUDA tuning for DOT

I just eyeballed the PTX. I should look more carefully to see where the extra instructions are coming from. There were definitely a lot of parameters, maybe there's some dead...

Additional SYCL USM (device pointer explicit copy) and CUDA tuning for DOT

I ran on AMD MI100 both "spock" at ORNL and one of the nodes at ANL JLSE. Aside, the JLSE nodes are a tiny bit faster on this and other...

Additional SYCL USM (device pointer explicit copy) and CUDA tuning for DOT

I ran SYCL2020 on A100, just the vanilla version with no USM changes. Dot is only 1198 GB/sec; I was geting 1290 GB/sec with SYCL version (not USM), and 1340...

Additional SYCL USM (device pointer explicit copy) and CUDA tuning for DOT

SYCL2020 with a redone dot kernel (but not USM) doesn't do quite as well as the original SYCL version on dot on A100: 1235 GB/sec vs. 1292 GB/sec, and 1339...

Additional SYCL USM (device pointer explicit copy) and CUDA tuning for DOT

Yes, I need to revisit SYCL-2020 vs. the previous version without the USM and be a little more rigorous. On the reduction, apparently it uses this: https://github.com/intel/llvm/blob/8213321ebb90110bf4f3d04fa0dc8e131a464a19/libclc/ptx-nvidiacl/libspirv/group/collectives.cl#L263 I note that...

Bytes transferred with CopyHostToDevice and CopyDeviceToHost

Yes, I want per-call information. Similarly I'd like per-call launch information for all kernel launches. They exist for hipModuleLaunchKernel but are blank for hipLaunchKernelGGL. I'm happy to mine this out...

Bytes transferred with CopyHostToDevice and CopyDeviceToHost

Yes, OK, fair enough, give m a few days.

Bytes transferred with CopyHostToDevice and CopyDeviceToHost

Well, it took more than a few days. Sorry. I do see that data in the sqlite db for copies: ``` ,BeginNs,EndNs,pid,tid,Name,args,Index,Data,__section,__lane,DurationNs 9164,7148818203319432,7148818203870232,19156,19156,hipMemcpyAsync,( dst(0xf89e20) src(0x7f3480e08000) sizeBytes(3840) kind(4) stream(2)),9165,,2,19156,550800 ``` (Sorry...

Bytes transferred with CopyHostToDevice and CopyDeviceToHost

I really don't know, this was so long ago. And now I work for AMD :) I will close the issue.