hychiang issues

Results 9 issues of


                                            hychiang

How to read depth.pgm file correctly?

Hi, I am trying to interpret depth value in depth.pgm file after I unpack the .sens file by SensReader. I read depth.pgm file by an online searched python script: `def...

Config file for semantic_synthesis512

I am wondering why the resolution in auto-encoder (first stage config) is 256 not 512. Thanks! https://github.com/CompVis/latent-diffusion/blob/main/models/ldm/semantic_synthesis512/config.yaml#L42

Training Loss NAN

Hi, I tried to reproduce your experiment with Cifar10, but I got training loss NaN. I am using a four GPUs machine with tensorflow-gpu 1.12 for the experiment. ![image](https://user-images.githubusercontent.com/9960543/133196944-9eca9370-12ef-4140-8ba7-3b24a73f0afc.png) Here...

batch_norm_manual gradient inconsistent

Hi Ptrblk, I am playing with PyTorch batchnorm2d and your implementation. I tried to use your implementation in mobilenetv3 and the performance seems similar. However, I found the gradient values...

MobileNetV2 is slow on EdgeTPU

### Description Hi, I am trying to run MobileNetV2 on the Edge TPU with a Dev Board Mini. I follow the instructions and run the classification example code on my...

type:performance

subtype:Mendel Linux

Hardware:Dev Board Mini

comp:compiler

comp:model

[QST] use FastLinearCombinationClamp to convert half accumulator to int8_t output?

Hello, could I use `FastLinearCombinationClamp` to convert `half_t` accumulator to `int8_t` output? or it only supports `int32_t` accumulator to `int8_t` output? Thanks! ```c++ using ElementInputA = cutlass::half_t; // ; ```...

question

? - Needs Triage

inactive-30d

inactive-90d

[QST] Row major for int8 matrix multiplications?

**What is your question?** As I read the document and the examples, the configurations I found for int8 x int8=int8 matrix multiplication is either RowMajor x ColumnMajor = ColumnMajor ([gemm_s8t_s8n_s8n](https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemm_s8t_s8n_s8n_tensor_op_s32_sm80.cu))...

question

? - Needs Triage

inactive-30d

Skipping cudagraphs for unknown reason

### System Info Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points. - `transformers` version: 4.41.2 - Platform: Linux-5.15.0-112-generic-x86_64-with-glibc2.35 - Python version: 3.10.13 -...

Good Second Issue

Feature request

Compilation

Cache

[QST] GemmUniversal is slower than GemmSplitKParallel when M and N are small and K is large

Hello, I read this [issue](https://github.com/NVIDIA/cutlass/issues/702#issuecomment-1331414081): * `kernel::GemmUniversal` with mode `GemmUniversalMode::kGemmSplitKParallel` will be equivalent to `kernel::GemmSplitKParallel`. The difference comes to fore for the `device::`-scoped kernels, wherein `device::GemmSplitKParallel` calls a reduction kernel...

question

? - Needs Triage