Results 9 issues of hychiang

Hi, I am trying to interpret depth value in depth.pgm file after I unpack the .sens file by SensReader. I read depth.pgm file by an online searched python script: `def...

I am wondering why the resolution in auto-encoder (first stage config) is 256 not 512. Thanks! https://github.com/CompVis/latent-diffusion/blob/main/models/ldm/semantic_synthesis512/config.yaml#L42

Hi, I tried to reproduce your experiment with Cifar10, but I got training loss NaN. I am using a four GPUs machine with tensorflow-gpu 1.12 for the experiment. ![image](https://user-images.githubusercontent.com/9960543/133196944-9eca9370-12ef-4140-8ba7-3b24a73f0afc.png) Here...

Hi Ptrblk, I am playing with PyTorch batchnorm2d and your implementation. I tried to use your implementation in mobilenetv3 and the performance seems similar. However, I found the gradient values...

### Description Hi, I am trying to run MobileNetV2 on the Edge TPU with a Dev Board Mini. I follow the instructions and run the classification example code on my...

type:performance
subtype:Mendel Linux
Hardware:Dev Board Mini
comp:compiler
comp:model

Hello, could I use `FastLinearCombinationClamp` to convert `half_t` accumulator to `int8_t` output? or it only supports `int32_t` accumulator to `int8_t` output? Thanks! ```c++ using ElementInputA = cutlass::half_t; // ; ```...

question
? - Needs Triage
inactive-30d
inactive-90d

**What is your question?** As I read the document and the examples, the configurations I found for int8 x int8=int8 matrix multiplication is either RowMajor x ColumnMajor = ColumnMajor ([gemm_s8t_s8n_s8n](https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemm_s8t_s8n_s8n_tensor_op_s32_sm80.cu))...

question
? - Needs Triage
inactive-30d

### System Info Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points. - `transformers` version: 4.41.2 - Platform: Linux-5.15.0-112-generic-x86_64-with-glibc2.35 - Python version: 3.10.13 -...

Good Second Issue
Feature request
Compilation
Cache

Hello, I read this [issue](https://github.com/NVIDIA/cutlass/issues/702#issuecomment-1331414081): * `kernel::GemmUniversal` with mode `GemmUniversalMode::kGemmSplitKParallel` will be equivalent to `kernel::GemmSplitKParallel`. The difference comes to fore for the `device::`-scoped kernels, wherein `device::GemmSplitKParallel` calls a reduction kernel...

question
? - Needs Triage