Ilia Sergachev
Ilia Sergachev
> for Zynq7000 I'm a bit asthonish, all my tests shows using same addr at CPU and logic side works perfectly. It depends on how do you assign CSR address...
Besides the linker script - I'm thinking of ways to include manufacturer HAL/BSP C libraries to enable software compilation for hard CPU cores. So far I've seen 3 levels of...
Also very likely - same linker script can be used for most ARM cores - using includes large part of these can be made common for all cores
> Do we need to implement IGMP protocol to receive the multicast IP packets? If your switch / router does IGMP snooping - yes, your device needs to send an...
Thank you! Given that there is an effect on NVPTX and ptxas compilation times it looks like some fusions do switch to cuDNN, but the impact on these benchmarks is...
OK, I raised it to 3.
Could you please also tell which exact cuDNN version did you try?
> Overall, cuDNN GEMMs do not perform better on H100 than Triton on our internal benchmark suite. The suite contains about 100 models, primarily transformers but also other popular architectures....
> Did you compute the Geomean for end-to-end benchmarks or the GEMMs individually? Individual GEMM fusions. > Doing this at scale will require better VLOGing to aggregate data across many...
I pushed 2 additional commits, just for this evaluation. I'll make them separate PRs if we decide to proceed with them. First reduces the number of cuDNN plans per fusion...