Deep-Learning-Hardware-Accelerator icon indicating copy to clipboard operation
Deep-Learning-Hardware-Accelerator copied to clipboard

Paper Collection of Deep Learning Hardware Accelerator

Deep-Learning-Hardware-Accelerator

A collection of works for hardware accelerators in deep learning.

Conference Paper

2015

  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks (FPGA 2015)

2016

  • DnnWeaver: From High-Level Deep Network Models to FPGA Acceleration (MICRO 2016)
  • Fused-layer CNN accelerators (MICRO 2016)
  • Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks (ICCAD 2016)
  • Going deeper with embedded fpga platform for convolutional neural network (FPGA 2016)
  • Automatic code generation of convolutional neural networks in FPGA implementation (FPT 2016)
  • Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware (ISVLSI 2016)
  • A high performance FPGA-based accelerator for large-scale convolutional neural networks (FPL 2016)
  • Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks (ISCA 2016)
  • C-brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization (DAC 2016)
  • Stripes: Bit-serial deep neural network computing (MICRO 2016)
  • Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks (ASP-DAC 2016)

2017

  • Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks (FPGA 2017)
  • Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs (DAC 2017)
  • A pipelined and scalable dataflow implementation of convolutional neural networks on FPGA (IPDPSW 2017)
  • A multistage dataflow implementation of a Deep Convolutional Neural Network based on FPGA for high-speed object recognition (SSIAI 2017)
  • Maximizing CNN accelerator efficiency through resource partitioning (ISCA 2017)
  • Design space exploration of FPGA accelerators for convolutional neural networks (DATE 2017)
  • Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks (CODES+ISSS 2017)
  • A Power-Efficient Accelerator for Convolutional Neural Networks (CLUSTER 2017)
  • In-Datacenter Performance Analysis of a Tensor Processing Unit (ISCA 2017)
  • FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks (HPCA 2017)
  • COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array (ICPADS 2017)

2019

  • An Energy-Aware Bit-Serial Streaming Deep Comvolutional Neural Network Accelerator (ICIP 2019)

Journal Paper

2016

  • Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse (IEEE Computer Architecture Letters 2016 Jan.-June)

2017

  • Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June)
  • Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks (JSSC 2017 Jan.)
  • Embedded Streaming Deep Neural Networks Accelerator With Applications (TNNLS 2017 July)
  • Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns (TVLSI 2017 Aug.)
  • Origami: A 803-GOp/s/W Convolutional Network Accelerator (TCSVT 2017 Nov.)

2018

  • Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA (TCAD 2018 Jan.)
  • A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things (TCSI 2018 Jan.)
  • An Architecture to Accelerate Convolution in Deep Neural Networks (TCSI 2018 April)
  • Data and Hardware Efficient Design for Convolutional Neural Network (TCSI 2018 May)
  • Efficient Hardware Architectures for Deep Convolutional Neural Network (TCSI 2018 June)
  • Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA (TVLSI 2018 Early Access)

Accelerator with quantization technique discussed in the paper

  • Going Deeper with Embedded FPGA Platform for Convolutional Neural Network (FPGA 2016)
  • Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA (ISVLSI 2016)(TCAD 2018 Jan.)

Paper about bit reduction

  • An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks (ICASSP 2018)
  • True-Gradient Based Training of Deep Binary Activated Neural Networks via Continuous Binarization (ICASSP 2018)

Serial Approach Architecture

  • Bit-Pragmatic Deep Neural Network Computing (2016)
  • Stripes: Bit-serial deep neural network computing (MICRO 2016)
  • Stripes: Bit-Serial Deep Neural Network Computing (IEEE Computer Architecture Letters 2017 Jan.-June)
  • Dynamic Stripes:Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks (2017)
  • Value-Based Deep-Learning Acceleration (IEEE Micro 2018 Jan./Feb.)
  • Exploiting Typical Values to Accelerate Deep Learning (Computer 2018 May)
  • Loom:Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks (DAC 2018)

Zero-skipping series Architecture

  • Cnvlutin:Ineffectual-Neuron-Free Deep Neural Network Computing (ISCA 2016)
  • Cnvlutin2:Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing (2017)