Hello_World
Hello_World
Hi, GCC 7 does not have header charconv, could you add some macro for supporting the earlier compilers? Best regards, Jie
感谢各位的付出。 { 7.2.2 Shared Memory 的访存几十个cycle,而且还受指令依赖等其他因素制约。通用寄存器GPR的访问周期是依附于指令的,FFMA这种指令快的话从指令issue到拿到结果只需要~10 cycle而且指令间存在并行。因此Shared Memory与GPR没有可比性,如果硬要比的话GPR也远快与Shared Memory。 Shared Memroy作用范围似乎没讲(一个block之内)。 Local Memory好像没说。 7.3 似乎是没讲CUTLASS,实际工业场景自定义算子用CUTLASS较多。 指令集层面应该没提SASS。 7.3.3 Fragment映射到底层是寄存器而不是TensorCore上的某个区域,所以前后文字包括伪代码应该都有一些不准确的地方。 最后 GEMM优化最重要的应当是提高compute intensity(通俗理解就是计算读取比),制约GPU的因素个人经验主要是在于内存(包括global与shared)读写周期长而不是计算。大部分时间在优化访存流水线,这里似乎用较多篇幅在强调WMMA与TensorCore。 此外 合并访问,warp divergence这些concepts似乎也没提。 } Best regards, Jie
**Describe the bug** Wrong traits for 64-bit integer **Steps/Code to reproduce bug** https://github.com/NVIDIA/cutlass/blob/main/tools/util/include/cutlass/util/type_traits.h#L118 **Expected behavior** N/A **Environment details (please complete the following information):** N/A **Additional context** I opened a PR:...
the [config_arch](https://github.com/NVIDIA/gdrcopy/blob/master/config_arch) script is for determining the architecture of the system. However, it generates a temp dir at `/tmp` using `mktemp` in [this line](https://github.com/NVIDIA/gdrcopy/blob/9ecd9cfa549cdc9d36785bc472d45a154f2ae7f3/config_arch#L23) and then compile and run an...
Can we use `target_include_directories` instead of `include_directories` to include dirs while not affecting the whole project if symengine is a submodule? Can you also make an interface target that will...
Tested environment: Ubuntu22.04 and macOS 13.5.2 Julia version: 1.9.3 Error string: ``` ERROR: LoadError: MethodError: no method matching show_plan(::IOStream, ::Vector{TimespanLogging.Timespan}, ::Thunk) Stacktrace: [1] (::Dagger.var"#140#141"{Thunk, Vector{TimespanLogging.Timespan}})(io::IOStream) @ Dagger ~/.julia/packages/Dagger/xGAvM/src/compute.jl:28 [2] open(::Dagger.var"#140#141"{Thunk,...
Auto-formatting tools are needed.
### 🐛 Describe the bug ``` /home/username/miniconda3/envs/envname/lib/libz.so (found version "1.3.1") -- Caffe2: Found protobuf with new-style protobuf targets. -- Caffe2: Protobuf version 25.3.0 -- Found CUDA: /home/username/miniconda3/envs/envname/targets/x86_64-linux (found version "12.5")...