paper-reading
paper-reading copied to clipboard
比做算法的懂工程落地,比做工程的懂算法模型。
Paper Reading -- Deep Learning Infra
比做算法的懂工程落地,比做工程的懂算法模型。
- 编程: c++ / CUDA / 汇编 / python / Shell
- 算法: deep learning / CV / NLP etc,训练框架,推理部署
- 加速: AI compiler, 并行优化,profile 工具
- 工程: 硬件体系结构,OS & linux kernel, 分布式 & k8s 集群,存储
Awesome Online Tools
URL | Brief Notes |
---|---|
https://en.wikichip.org/wiki/WikiChip | 查各类芯片的架构 & spec |
https://www.cpubenchmark.net | 查芯片(CPU)的 benchmark, 算力(Ops/s) |
https://www.videocardbenchmark.net | 查显卡的 benchmark |
https://godbolt.org/ | 在线看 c++ 代码的汇编代码 |
https://quick-bench.com/ | 在线测 c++ 代码的 benchmark |
https://en.cppreference.com | c++ 手册 |
My Blog
- my-plans-and-reviews
AI compiler
- learn-tvm-from-scratch
- compiler-learning-map
Deep Learning
frameworks
- deep-learning-framework-list
- Understanding ONNX
- pytorch-learning-map
CUDA and GPU
- learning-cuda
HPC - 高性能计算
Learning Maps
- perf-tools-map: 性能调优的工具 & 工具使用文档
- cpu 架构: todo
- gpu 架构 & CUDA: todo
- 并行加速: todo (指令级并行,单独 topic?)
Good Readings
- linux-performance-analysis-and-tools
- general-matrix-multiplicatio-perf-estimate
Tutorials with code
- Hands on CUDA cuda 新手入门
- OpenMP tutorial one of the eight tutorials in the 4+ day "Using LLNL's Supercomputers" workshop
Engineering 工程化
Docker & K8S
- Docker and OCI Runtimes docker 的设计与实现方案
- nvidia-docker: Enabling GPUs in Docker nvidia-docker 的用法 & 原理
Protobuf & gRPC
(文档)
- https://developers.google.com/protocol-buffers/docs/proto3 Language Guide (proto3)
- https://developers.google.com/protocol-buffers/docs/style Protocol Buffers Style Guide
- https://grpc.io/docs/languages/cpp/basics/ gPRC Basics tutorial
- https://edgehog.blog/a-guide-to-grpc-and-interceptors-265c306d3773 gRPC interceptors
(笔记)
- Protobuf Install And Introduction
- Protobuf Best Practices
- TODO 用 gPRC + docker 发布一个完整的 web 服务 example code
编程语言
汇编
- x86 汇编
- MIPS 汇编
C++
Python & Shell