Deep-Learning-Hardware-Accelerator
Deep-Learning-Hardware-Accelerator copied to clipboard
Paper Collection of Deep Learning Hardware Accelerator
Deep-Learning-Hardware-Accelerator
A collection of works for hardware accelerators in deep learning.
Conference Paper
2015
- Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA 2015)
2016
- DnnWeaver: From High-Level Deep Network Models to FPGA Acceleration (MICRO 2016)
- Fused-layer CNN accelerators (MICRO 2016)
- Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks (ICCAD 2016)
- Going deeper with embedded fpga platform for convolutional neural network (FPGA 2016)
- Automatic code generation of convolutional neural networks in FPGA implementation (FPT 2016)
- Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware (ISVLSI 2016)
- A high performance FPGA-based accelerator for large-scale convolutional neural networks (FPL 2016)
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks (ISCA 2016)
- C-brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC 2016)
- Stripes: Bit-serial deep neural network computing (MICRO 2016)
- Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks (ASP-DAC 2016)
2017
- Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks (FPGA 2017)
- Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs (DAC 2017)
- A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA (IPDPSW 2017)
- A multistage dataflow implementation of a Deep Convolutional Neural Network based on FPGA for high-speed object recognition (SSIAI 2017)
- Maximizing CNN accelerator efficiency through resource partitioning (ISCA 2017)
- Design space exploration of FPGA accelerators for convolutional neural networks (DATE 2017)
- Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks (CODES+ISSS 2017)
- A Power-Efficient Accelerator for Convolutional Neural Networks (CLUSTER 2017)
- In-Datacenter Performance Analysis of a Tensor Processing Unit (ISCA 2017)
- FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks (HPCA 2017)
- COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array (ICPADS 2017)
2019
- An Energy-Aware Bit-Serial Streaming Deep Comvolutional Neural Network Accelerator (ICIP 2019)
Journal Paper
2016
- Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse (IEEE Computer Architecture Letters 2016 Jan.-June)
2017
- Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June)
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks (JSSC 2017 Jan.)
- Embedded Streaming Deep Neural Networks Accelerator With Applications (TNNLS 2017 July)
- Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns (TVLSI 2017 Aug.)
- Origami: A 803-GOp/s/W Convolutional Network Accelerator (TCSVT 2017 Nov.)
2018
- Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA (TCAD 2018 Jan.)
- A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things (TCSI 2018 Jan.)
- An Architecture to Accelerate Convolution in Deep Neural Networks (TCSI 2018 April)
- Data and Hardware Efficient Design for Convolutional Neural Network (TCSI 2018 May)
- Efficient Hardware Architectures for Deep Convolutional Neural Network (TCSI 2018 June)
- Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA (TVLSI 2018 Early Access)
Accelerator with quantization technique discussed in the paper
- Going Deeper with Embedded FPGA Platform for Convolutional Neural Network (FPGA 2016)
- Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA (ISVLSI 2016)(TCAD 2018 Jan.)
Paper about bit reduction
- An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks (ICASSP 2018)
- True-Gradient Based Training of Deep Binary Activated Neural Networks via Continuous Binarization (ICASSP 2018)
Serial Approach Architecture
- Bit-Pragmatic Deep Neural Network Computing (2016)
- Stripes: Bit-serial deep neural network computing (MICRO 2016)
- Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June)
- Dynamic Stripes:Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks (2017)
- Value-Based Deep-Learning Acceleration (IEEE Micro 2018 Jan./Feb.)
- Exploiting Typical Values to Accelerate Deep Learning (Computer 2018 May)
- Loom:Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks (DAC 2018)
Zero-skipping series Architecture
- Cnvlutin:Ineffectual-Neuron-Free Deep Neural Network Computing (ISCA 2016)
- Cnvlutin2:Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing (2017)