Deep-Learning-Hardware-Accelerator

A collection of works for hardware accelerators in deep learning.

Conference Paper

2015

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA 2015)

2016

DnnWeaver: From High-Level Deep Network Models to FPGA Acceleration (MICRO 2016)
Fused-layer CNN accelerators (MICRO 2016)
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks (ICCAD 2016)
Going deeper with embedded fpga platform for convolutional neural network (FPGA 2016)
Automatic code generation of convolutional neural networks in FPGA implementation (FPT 2016)
Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware (ISVLSI 2016)
A high performance FPGA-based accelerator for large-scale convolutional neural networks (FPL 2016)
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks (ISCA 2016)
C-brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC 2016)
Stripes: Bit-serial deep neural network computing (MICRO 2016)
Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks (ASP-DAC 2016)

2017

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks (FPGA 2017)
Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs (DAC 2017)
A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA (IPDPSW 2017)
A multistage dataflow implementation of a Deep Convolutional Neural Network based on FPGA for high-speed object recognition (SSIAI 2017)
Maximizing CNN accelerator efficiency through resource partitioning (ISCA 2017)
Design space exploration of FPGA accelerators for convolutional neural networks (DATE 2017)
Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks (CODES+ISSS 2017)
A Power-Efficient Accelerator for Convolutional Neural Networks (CLUSTER 2017)
In-Datacenter Performance Analysis of a Tensor Processing Unit (ISCA 2017)
FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks (HPCA 2017)
COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array (ICPADS 2017)

2019

An Energy-Aware Bit-Serial Streaming Deep Comvolutional Neural Network Accelerator (ICIP 2019)

Journal Paper

2016

Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse (IEEE Computer Architecture Letters 2016 Jan.-June)

2017

Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June)
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks (JSSC 2017 Jan.)
Embedded Streaming Deep Neural Networks Accelerator With Applications (TNNLS 2017 July)
Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns (TVLSI 2017 Aug.)
Origami: A 803-GOp/s/W Convolutional Network Accelerator (TCSVT 2017 Nov.)

2018

Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA (TCAD 2018 Jan.)
A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things (TCSI 2018 Jan.)
An Architecture to Accelerate Convolution in Deep Neural Networks (TCSI 2018 April)
Data and Hardware Efficient Design for Convolutional Neural Network (TCSI 2018 May)
Efficient Hardware Architectures for Deep Convolutional Neural Network (TCSI 2018 June)
Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA (TVLSI 2018 Early Access)

Accelerator with quantization technique discussed in the paper

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network (FPGA 2016)
Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA (ISVLSI 2016)(TCAD 2018 Jan.)

Paper about bit reduction

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks (ICASSP 2018)
True-Gradient Based Training of Deep Binary Activated Neural Networks via Continuous Binarization (ICASSP 2018)

Serial Approach Architecture

Bit-Pragmatic Deep Neural Network Computing (2016)
Stripes: Bit-serial deep neural network computing (MICRO 2016)
Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June)
Dynamic Stripes：Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks (2017)
Value-Based Deep-Learning Acceleration (IEEE Micro 2018 Jan./Feb.)
Exploiting Typical Values to Accelerate Deep Learning (Computer 2018 May)
Loom：Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks (DAC 2018)

Zero-skipping series Architecture

Cnvlutin：Ineffectual-Neuron-Free Deep Neural Network Computing (ISCA 2016)
Cnvlutin2：Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing (2017)

Deep-Learning-Hardware-Accelerator
Deep-Learning-Hardware-Accelerator copied to clipboard

Metadata

Deep-Learning-Hardware-Accelerator

Conference Paper

2015

2016

2017

2019

Journal Paper

2016

2017

2018

Accelerator with quantization technique discussed in the paper

Paper about bit reduction

Serial Approach Architecture

Zero-skipping series Architecture

← Metadata

Owner

Metadata

Deep-Learning-Hardware-Accelerator Deep-Learning-Hardware-Accelerator copied to clipboard

Metadata

Deep-Learning-Hardware-Accelerator

Conference Paper

2015

2016

2017

2019

Journal Paper

2016

2017

2018

Accelerator with quantization technique discussed in the paper

Paper about bit reduction

Serial Approach Architecture

Zero-skipping series Architecture

← Metadata

Owner

Metadata

Deep-Learning-Hardware-Accelerator
Deep-Learning-Hardware-Accelerator copied to clipboard