CINN issues

Fix ir copy of var

1

BiynXu

Modify Vectorized Pass

1

- RT

JamesLim-sy

Tricky reduce fuse

1

Working in Progress

zhhsplendid

Test TMP

1

6clc

feat(compile): add nvcc compiler

1

Compile cuda_c source code using nvcc by system call to generate ptx and cubin.

BiynXu

Using weak ptr in group

1

将graph.group中一些数据结构由shared_ptr改为weak_ptr，以防止fusion merge pass时造成循环引用。

BiynXu

using weak ptr

1

使用weak ptr避免循环引用。

SunNy820828449

Nvcc compile

1

using system call to use nvcc compile cuda-c cdoe and generate ptx and cubin

SunNy820828449

A100 Speed Benchmark Temporary PR

1

zhhsplendid

[WIP]support matmul_v2_grad

1

支持matmul_v2_grad，对于`scale->gemm->scale`这种结构，反向可以省去两次scale操作。当seq_len比较大的时候，attention中q*k的输出矩阵比较大，单独执行scale，耗时也会比较高。

zkh2016

CINN
CINN copied to clipboard

Metadata

Fix ir copy of var

Modify Vectorized Pass

Tricky reduce fuse

Test TMP

feat(compile): add nvcc compiler

Using weak ptr in group

using weak ptr

Nvcc compile

A100 Speed Benchmark Temporary PR

[WIP]support matmul_v2_grad

← Metadata

Owner

Metadata

CINN CINN copied to clipboard

Metadata

← Metadata

Owner

Metadata

CINN
CINN copied to clipboard