Papers-books-and-blogs icon indicating copy to clipboard operation
Papers-books-and-blogs copied to clipboard

This repository contains the research papers, white papers, thesis etc that I love.

This repository contains a list of the books, blogs, research papers and white papers that I have read and found interesting.

Table of contents

  • AI, DL, NLP and RL
  • Calculus
  • Computer Architecture
  • Computer Graphics
  • Data Structures and Algorithms
  • Digital Electronics
  • Graph Theory
  • Information Theory
  • Linear Algebra
  • Measure Theory
  • Optimization Theory
  • Probability and Stochastic Processes
  • Quantum Computing
  • Signal Processing

AI, DL, NLP and RL

  1. 1-bit Adam: communication efficient large-scale training with Adam’s convergence speed
    image image image
  2. 5 best practices for efficient model training
    image image image image
  3. 8-bit approximations for parallelism in deep learning
    image image image image image
  4. 8-bit optimizers via block-wise quantization
    image image image
  5. A 'neural' network that learns to play Backgammon
    image image
  6. A BetterTransformer for fast transformer inference
    image image image image
  7. A deep reinforced model for abstractive summarization
    image image image image
  8. A dynamical approach to temporal pattern processing
    image image
  9. A few more examples may be worth billions of parameters
    image image image
  10. A general and adaptive robust loss function
    image image
  11. A generalist agent
    image image
  12. A gentle introduction to 8-bit matrix multiplication for transformers at scale using Hugging Face transformers, accelerate and bitsandbytes
    image image image image
  13. A note on the evaluation of generative models
    image image image
  14. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    image image image
  15. A simple but tough-to-beat baseline for sentence embeddings
    image image image
  16. A simple language model for task-oriented dialogue
    image image image
  17. A simple neural attentive meta-learner
    image image image image
  18. A simple neural network module for relational reasoning
    image image image
  19. A study of BFLOAT16 for deep learning training
    image image image
  20. A style-based generator architecture for generative adversarial networks
    image image image image
  21. A stylometric inquiry into hyperpartisan and fake news
    image image image
  22. A3T: adversarially augmented adversarial training
    image image image image
  23. Accelerated PyTorch 2 transformers
    image image image image
  24. Accelerating large language model training with variable sparse pre-training and dense fine-tuning
    image image image
  25. Accelerating PyTorch with CUDA graphs
    image image image image
  26. AdapterHub: a framework for adapting transformers
    image image image image
  27. Adversarial approximate inference for speech to electroglottograph conversion
    image image image image image image
  28. Adversarial autoencoders
    image image image image
  29. Adversarial examples that fool both computer vision and time-limited humans
    image image image
  30. Adversarial feature learning
    image image image image
  31. Adversarial generation of natural language
    image image image image
  32. Adversarial information factorization
    image image image image
  33. Adversarially learned inference
    image image image image
  34. AlexaTM 20B: few-shot learning using a large-scale multilingual seq2seq model
    image image image image
  35. Amazon SageMaker model parallelism: a general and flexible framework for large model training
    image image image image image
  36. An image is worth 16x16 words: transformers for image recognition at scale
    image image image image
  37. An overview of gradient descent optimization algorithms
    image image image
  38. Analysing mathematical reasoning abilities of neural models
    image image
  39. Approximation by superpositions of sigmoidal function
    image image
  40. Artificial Intelligence: a modern approach
    image
  41. Aspect based sentiment analysis with gated convolutional networks
    image image image
  42. Attention is all you need
    image image image image
  43. Attention is off by one
    image image image
  44. Auto-encoding variational Bayes
    image image image
  45. Backpropagation through the void: optimizing control variates for black-box gradient estimation
    image image image image image image image image
  46. BART: denoising sequence-to-sequence pre-training for natural language generation, translation and comprehension
    image image image
  47. Batch normalization: accelerating deep network training by reducing internal covariate shift
    image image image image
  48. Behavioral cloning from observation
    image image image image image
  49. BERT: pre-training of deep bidirectional transformers for language understanding
    image image image
  50. Better & faster large language models via multi-token prediction
    image image image
  51. Beyond domain APIs: Task-oriented conversational modeling with unstructured knowledge access
    image image image
  52. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation
    image image image
  53. Blockwise parallel transformer for large context models
    image image image image
  54. BLOOM: A 176B-parameter open-access multilingual language model
    image image image image image
  55. Bootstrapping entity alignment with knowledge graph embedding
    image image image image
  56. Bridging the gap between prior and posterior knowledge selection for knowledge-grounded dialogue generation
    image image image image
  57. Bringing open large language models to consumer devices
    image image image image
  58. BTLM-3B-8K: 7B performance in a 3 billion parameter model
    image image image image
  59. Building blocks for a complex-valued transformer architecture
    image image image image
  60. CATS: contextually-aware thresholding for sparsity in large language models
    image image image image
  61. ChatGPT: optimizing language models for dialogue
    image image image
  62. ColBERT: efficient and effective passage search via contextualized late interaction over BERT
    image image image
  63. Colossal-AI: a unified deep learning system for large-scale parallel training
    image image image image image
  64. Compiling machine learning programs via high-level tracing
    image image image
  65. Complex transformer: a framework for modeling complex-valued sequence
    image image image image
  66. Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning
    image image image image
  67. Conditional image synthesis with auxilliary classifier GANs
    image image image image
  68. Conformal nucleus sampling
    image image image
  69. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers
    image image image image
  70. Connectivity versus entropy
    image image
  71. Constituency parsing with a self-attentive encoder
    image image image
  72. Constraint based knowledge base distillation in end-to-end task oriented dialogs
    image image image
  73. Context generation improves open domain question answering
    image image image image
  74. Convert transformers to ONNX with hugging face optimum
    image image image image
  75. Convolutional networks for graphs for learning molecular fingerprints
    image image image
  76. Convolutional neural network language models
    image image
  77. Countering adversarial images using input transformations
    image image image image
  78. Cramming: training a language model on a single GPU in one day
    image image image
  79. Crosslingual generalization through multitask finetuning
    image image image image image image
  80. Curriculum learning
    image image image
  81. Cutting down on prompts and parameters: simple few-shot learning with language models
    image image image image
  82. Data engineering for scaling language models to 128K context
    image image image image
  83. Deep Boltzmann machines
    image image image image
  84. Deep complex networks
    image image image
  85. Deep learning
    image
  86. Deep learning and the information bottleneck principle
    image image image
  87. Deep learning techniques for super-resolution in video games
    image image image image image
  88. Deep residual learning for image recognition
    image image image
  89. Deep text classification can be fooled
    image image image
  90. DeepSpeed compression: a composable library for extreme compression and zero-cost quantization
    image image image image
  91. DeepSpeed Inference: enabling efficient inference of transformer models at unprecedented scale
    image image image image
  92. DeepSpeed powers 8x larger MoE model training with high performance
    image image image image image
  93. DeepSpeed Ulysses: system optimizations for enabling training of extreme long sequence transformer models
    image image image image
  94. DeepSpeed: accelerating large-scale model inference and training via system optimizations and compression
    image image image image
  95. DeepSpeed: advancing MoE inference and training to power next-generation AI scale
    image image image image image
  96. Denoising distantly supervised open-domain question answering
    image image image
  97. Diffusion convolutional recurrent neural network: data-driven traffic forecasting
    image image image image image
  98. Discrete variational autoencoders
    image image image
  99. Disentangling by factorising
    image image image image
  100. Disentangling language and knowledge in task-oriented dialogs
    image image image image
  101. Distributionally robust language modeling
    image image image image
  102. Editing models with task arithmetic
    image image image image
  103. Efficient estimation of word representations in vector space
    image image image
  104. Efficient large scale language modeling with mixtures of experts
    image image image image image
  105. Efficient large-scale language model training on GPU clusters using Megatron-LM
    image image image image image
  106. Enchancing the reliability of out-of-distribution image detection in neural networks
    image image image
  107. End-to-end task-oriented dialog modeling with semi-structured knowledge management
    image image image
  108. Enhance reasoning for large language models in the game Werewolf
    image image image image
  109. Ensemble adversarial training: attacks and defenses
    image image image image
  110. Equilibrium propagation: bridging the gap between energy-based models and backpropagation
    image image image image
  111. Estimating or propagating gradients through stochastic neurons for conditional computation
    image image image image
  112. Exemplar encoder-decoder for neural conversation generation
    image image image
  113. Expert human-level driving in gran turismo sport using deep reinforcement learning with image-based representation
    image image
  114. Exploring deep recurrent models with reinforcement learning for molecule design
    image image image
  115. Exploring the limits of transfer learning with a unified text-to-text transformer
    image image image
  116. Extreme compression for pre-trained transformers made simple and efficient
    image image image image image image image
  117. Fast abstractive summarization with reinforce-selected sentence rewriting
    image image image image
  118. Fast benchmarking of accuracy vs. training time with cyclic learning rates
    image image image
  119. Fast transformer decoding: one write-head is all you need
    image image image image
  120. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
    image image image
  121. FFJORD: Free-form continuous dynamics for scalable reversible generative models
    image image image
  122. Finetuned language models are zero-shot learners
    image image image image
  123. Flash-decoding for long-context inference
    image image image image
  124. FlashAttention: fast and memory-efficient exact attention with IO-awareness
    image image image image
  125. FlashAttention: fast transformer training with long sequences
    image image image image
  126. Foundations of NLP explained visually: beam search, how it works
    image image image
  127. FP8 formats for deep learning
    image image image image
  128. FP8-LM: training FP8 large language models
    image image image image
  129. Gemini: a family of highly capable multimodal models
    image image image
  130. Gemma: open models based on Gemini research and technology
    image image image image
  131. Generating adversarial examples with adversarial networks
    image image image image image
  132. Generating sentences from a continuous space
    image image
  133. Generation-augmented retrieval for open-domain question answering
    image image image
  134. Generative adversarial nets
    image image image image
  135. Generative pretraining from pixels
    image image image
  136. Genetic algorithms in search, optimization and machine learning
    image
  137. GeoMAN: multi-level attention networks for geo-sensory time series prediction
    image image image image
  138. Getting the most out of the NVIDIA A100 GPU with Multi-Instance GPU
    image image image
  139. GLaM: efficient scaling of language models with mixture-of-experts
    image image image image image
  140. GLM-130B: an open bilingual pre-trained model
    image image image image
  141. GLU variants improve transformer
    image image image
  142. Going deeper with convolutions
    image image image
  143. GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE
    image image image image image image image
  144. GPT-NeoX-20B: an open-source autoregressive language model
    image image image image
  145. GQA: training generalized multi-query transformer models from multi-head checkpoints
    image image image image
  146. Gradient-based hyperparameter optimization through reversible learning
    image image image
  147. Graph attention networks
    image image image
  148. Grounding large language models in interactive environments with online reinforcement learning
    image image image
  149. Hierarchical neural story generation
    image image image
  150. Hindsight: posterior-guided training of retrievers for improved open-ended generation
    image image image image
  151. HiPPO: recurrent memory with optimal polynomial projections
    image image image
  152. HotFlip: white-box adversarial examples for text classification
    image image image
  153. How big should my language model be?
    image image image image
  154. How Pytorch 2.0 accelerates deep learning with operator fusion and CPU/GPU code-generation
    image image image
  155. How should AI systems behave, and who should decide?
    image image image
  156. How we sped up transformer inference 100x for 🤗 API customers
    image image image
  157. How 🤗 Accelerate runs very large models thanks to PyTorch
    image image image image image
  158. Hydragen: high-throughput LLM inference with shared prefixes
    image image image
  159. HyKnow: end-to-end task-oriented dialog modeling with hybrid knowledge management
    image image image
  160. Hyperparameter search with Transformers and Ray Tune
    image image image image
  161. Image-to-image translation with conditional generative adversarial networks
    image
  162. ImageNet classification using deep convolutional neural networks
    image image image
  163. Improving entity linking by modeling latent relations between mentions
    image image image
  164. Improving language models by retrieving from trillions of tokens
    image image image image image
  165. Improving language understanding by generative pre-training
    image image image
  166. Improving reinforcement learning from human feedback with efficient reward model ensemble
    image image image image
  167. Incredibly fast BLOOM inference with DeepSpeed and Accelerate
    image image image image
  168. Inference suboptimality in variational autoencoders
    image image image image
  169. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets
    image image image image image
  170. Interpretable convolutional neural networks via feedforward design
    image image
  171. Introducing MPT-7B: a new standard for open-source, commercially usable LLMs
    image image image image
  172. Introducing nvFuser, a deep learning compiler for PyTorch
    image image image image image
  173. Introducing Turing image super resolution: AI powered image enhancements for Microsoft Edge and Bing maps
    image image image
  174. Introducing 🤗 accelerate
    image image image image
  175. Is ChatGPT 175 billion parameters? Technical analysis
    image image image
  176. Is the future of neural networks Sparse? An introduction (1/N)
    image image image
  177. Jack of all trades, master of some, a multi-purpose transformer agent
    image image
  178. Jack of all trades, master of some, a multi-purpose transformer agent
    image image
  179. Joint reasoning on hybrid-knowledge sources for task-oriented dialog
    image image image
  180. Judging LLM-as-a-judge with MT-bench and chatbot arena
    image image image
  181. Know what you don't know: unanswerable questions for SQuAD
    image image image
  182. Knowledge-grounded dialogue generation with pre-trained language models
    image image image
  183. Language is not all you need: aligning perception with language models
    image image image
  184. Language modeling with gated convolutional networks
    image image image
  185. Language modelling with pixels
    image image image
  186. Language models (mostly) know what they know
    image image image
  187. Language models are unsupervised multitask learners
    image image image
  188. Language models as compilers: simulating pseudocode execution improves algorithmic reasoning in language models
    image image image
  189. Large language models are not fair evaluators
    image image image
  190. Layer normalization
    image image image image
  191. Layer-condensed KV cache for efficient inference of large language models
    image image image image
  192. Learning activation functions to improve deep neural networks
    image image
  193. Learning associative inference using fast weight memory
    image image image
  194. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders
    image image image image
  195. Learning on a general network
    image image image
  196. Learning representations by back-propagating errors
    image image image
  197. Learning transferable visual models from natural language supervision
    image image image
  198. Learning transferable visual models from natural language supervision
    image image image image
  199. Learning word embeddings efficiently with noise-contrastive estimation
    image image image
  200. Leave no context behind: efficient infinite context transformers with infini-attention
    image image image
  201. Lessons learned on language model safety and misuse
    image image image
  202. Lifelong language pretraining with distribution-specialized experts
    image image image image
  203. Linear scaling made possible with weight streaming
    image image image image image
  204. Linformer: self-attention with linear complexity
    image image image image
  205. LLM in a flash: efficient large language model inference with limited memory
    image image image image
  206. LLM.int8(): 8-bit matrix multiplication for transformers at scale
    image image image
  207. Long sequence modeling with XGen: a 7B LLM trained on 8K input sequence length
    image image image image
  208. LoRA: Low-Rank Adaptation of large language models
    image image image
  209. Lost in the middle: how language models use long contexts
    image image image
  210. M6-10T: a sharing-delinking paradigm for efficient multi-trillion parameter pretraining
    image image image image image image image image
  211. Machine learning
    image
  212. Machine learning: a probabilistic perspective
    image
  213. Making deep learning go brrrr from first principles
    image image image
  214. Making DeepSpeed ZeRO run efficiently on more-affordable hardware
    image image image image
  215. Mask & focus: conversation modelling by learning concepts
    image image image
  216. Matryoshka representation learning
    image image image
  217. Maximizing communication efficiency for large-scale training via 0/1 Adam
    image image image
  218. MCR-DL: mix-and-match communication runtime for deep learning
    image image image image
  219. MegaBlocks: efficient sparse training with mixture-of-experts
    image image image image
  220. Megatron-LM: training multi-billion parameter language models using model parallelism
    image image image image image image
  221. Memory-efficient pipeline-parallel DNN training
    image image image image image image
  222. MinTL: minimalist transfer learning for task-oriented dialogue systems
    image image image
  223. Mix and match: learning-free controllable text generation using energy language models
    image image image
  224. Mixed precision training
    image image image
  225. Mixture of attention heads: selecting attention heads per token
    image image image image image
  226. Mixture-of-Experts meets instruction tuning: a winning combination for large language models
    image image image image
  227. mixup: beyond empirical risk minimization
    image image image image image image
  228. MMCoQA: conversational question answering over text, tables and images
    image image image image image
  229. Mode matching in GANs through latent space learning and inversion
    image image image image
  230. Multi-level memory for task oriented dialogs
    image image image
  231. Multitask prompt tuning enables parameter-efficient transfer learning
    image image image
  232. MultiWOZ - A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling
    image image image image
  233. Mutual information neural estimation
    image image image
  234. NeMo: a toolkit for building AI applications using neural modules
    image image image image
  235. Neural GPUs learn algorithms
    image image image
  236. Neural network methods for natural language processing
    image
  237. Neural networks and physical systems with emergent collective computational abilities
    image image image image
  238. Neural networks for pattern recognition
    image
  239. Neural ordinary differential equations
    image image image
  240. No train no gain: revisiting efficient training algorithms for transformer-based language models
    image image image
  241. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
    image image image
  242. OctoPack: instruction tuning code large language models
    image image image image
  243. On the convergence of Adam and beyond
    image image image image
  244. On the power of neural networks for solving hard problems
    image image image
  245. One model to learn them all
    image image image
  246. Open domain question answering over tables via dense retrieval
    image image image
  247. Open question answering over tables and text
    image image image
  248. OPT: open pre-trained transformer language models
    image image image image
  249. Optimal brain compression: a framework for accurate post-training quantization and pruning
    image image image image
  250. Optimal perceptual inference
    image image image
  251. Optimization story: Bloom inference
    image image image image
  252. Orca 2: teaching small language models how to reason
    image image image
  253. Orca: progressive learning from complex explanation traces of GPT-4
    image image image
  254. Outer product-based neural collaborative filtering
    image image image image
  255. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer
    image image image
  256. Overcoming oscillations in quantization-aware training
    image image image
  257. PAL: Program-aided language models
    image image image
  258. PaLM: scaling language modeling with pathways
    image image image image image
  259. Parallel context windows improve in-context learning of large language models
    image image image image
  260. Pattern classification
    image
  261. Pattern recognition and machine learning
    image
  262. Perceptual losses for real-time style transfer and super-resolution
    image image image
  263. Personalizing dialogue agents: I have a dog, do you have pets too?
    image image image
  264. Phase-functioned neural networks for character control
    image image image
  265. Playing Atari with deep reinforcement learning
    image image
  266. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing
    image image image image image
  267. Prefix-tuning: optimizing continuous prompts for generation
    image image image image
  268. Probabilistic latent semantic analysis
    image image image
  269. Progressive growing of GANs from improved quality, stability and variation
    image image image image
  270. Prompting with pseudo-code instructions
    image image image
  271. Proximal policy optimization algorithms
    image image
  272. PullNet: open domain question answering with iterative retrieval on knowledge bases and text
    image image image
  273. PyTorch trace analysis for the masses
    image image image
  274. Q-BERT: Hessian based ultra low precision quantization of BERT
    image image image
  275. R3Net: recurrent residual refinement network for saliency detection
    image image image
  276. Reading Wikipedia to answer open-domain questions
    image image image
  277. REALM: Retrieval-augmented language model pretraining
    image image image image
  278. Recurrent models of visual attention
    image
  279. Reducing activation recomputation in large transformer models
    image image image image image
  280. Regularizing and optimizing LSTM language models
    image image image image
  281. Reinforcement Learning: An Introduction
    image
  282. ReLoRA: high-rank training through low-rank updates
    image image image
  283. Restricted Boltzmann machines for collaborative filtering
    image image image image image image
  284. Retrieval augmentation reduces hallucination in conversation
    image image image image image
  285. Retrieval-augmented generation for knowledge-intensive NLP tasks
    image image image image image
  286. Revisiting classifier two-sample tests
    image image image
  287. RoBERTa: a robustly optimized BERT pretraining approach
    image image image
  288. RoFormer: enhanced transformer with rotary position embedding
    image image image image
  289. SantaCoder: don't reach for the stars!
    image image image image image
  290. Scaling instruction-finetuned language models
    image image image
  291. Scaling PyTorch FSDP for training foundation Models on IBM cloud
    image image image image
  292. Scaling transformer to 1M tokens and beyond with RMT
    image image image
  293. Scattered mixture-of-experts implementation
    image image image image
  294. Self-instruct: aligning language model with self generated instructions
    image image image
  295. Self-normalizing neural networks
    image image image image
  296. Semantically equivalent adversarial rules for debugging NLP models
    image image
  297. Seq2seq model and the exposure bias problem
    image image image
  298. Sequence parallelism: long sequence training from system perspective
    image image image image image
  299. Sequential latent knowledge selection for knowledge-grounded dialogue
    image image image image
  300. Simple and effective multi-paragraph reading comprehension
    image image image
  301. Simplifying transformer blocks
    image image image
  302. SlimPajama-DC: understanding data combinations for LLM training
    image image image
  303. SmoothQuant: accurate and efficient post-training quantization for large language models
    image image image image image
  304. Soft filter pruning for accelerating deep convolutional neural networks
    image image image
  305. SOLAR 10.7B: scaling large language models with simple yet effective depth up-scaling
    image image image image
  306. SOLOIST: building task bots at scale with transfer learning and machine teaching
    image image image image image
  307. Solving quantitative reasoning problems with language models
    image image image image
  308. Spatial temporal graph convolutional networks for skeleton-based action recognition
    image image image image
  309. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting
    image image image image image
  310. Spectral normalization for generative adversarial networks
    image image image
  311. Speech and language processing
    image
  312. StarCoder: may the source be with you!
    image image image image image
  313. Sticking the landing: simple, lower-variance gradient estimators for variational inference
    image image image image
  314. StitchNet: composing neural networks from pre-trained fragments
    image image image
  315. Stochastic hyperparameter optimization through hypernetworks
    image image image
  316. Strategies for teaching layered networks classification tasks
    image image
  317. Structured prompting: scaling in-context learning to 1,000 examples
    image image image
  318. Style transfer from non-parallel text by cross-alignment
    image image image
  319. Subword regularization: improving neural network translation models with multiple subword candidates
    image image image image
  320. Supervised learning of probability distributions by neural networks
    image image
  321. Supporting efficient large model training on AMD InstinctTM GPUs with DeepSpeed
    image image image image
  322. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity
    image image image
  323. Synchronization in neural nets
    image image
  324. Synthetic data (almost) from scratch: generalized instruction tuning for language models
    image image image image
  325. Tackling the poor assumptions of Naive Bayes text classifiers
    image image
  326. Tensor programs V: tuning large neural networks via zero-shot hyperparameter transfer
    image image image
  327. TextWorld: a learning environment for text-based games
    image image image
  328. The best of both worlds: combining recent advances in neural machine translation
    image image image
  329. The elements of statistical learning: data mining, inference and prediction
    image
  330. The Flan collection: designing data and methods for effective instruction tuning
    image image image
  331. The information bottleneck method
    image image
  332. The Pile: an 800GB dataset of diverse text for language modeling
    image image image
  333. The power of scale for parameter-efficient prompt tuning
    image image image
  334. The wisdom of hindsight makes language models better instruction followers
    image image image
  335. Thermometer encoding: one hot way to resist adversarial examples
    image image image image
  336. To regularize or not to regularize? The bias variance trade-off in regularized AEs
    image image image
  337. Towards crowdsourced training of large neural networks using decentralized mixture-of-experts
    image image image image
  338. Towards deep learning models resilient to adversarial attacks
    image image image image
  339. Towards evaluating the robustness of neural networks
    image image image image
  340. Train short, test long: Attention with linear biases enables input length extrapolation
    image image image image
  341. Training compute-optimal large language models
    image image image image
  342. Training language models to follow instructions with human feedback
    image image image image
  343. Transformer memory as a differentiable search index
    image image image image
  344. Transformer quality in linear time
    image image image
  345. Transformer-XL: attentive language models beyond a fixed-length context
    image image image image
  346. Transformers explained visually (part 1): overview of functionality
    image image image
  347. Transformers explained visually (part 2): how it works, step-by-step
    image image image
  348. Transformers explained visually (part 3): multi-head attention, deep dive
    image image image
  349. Turing-NLG: a 17-billion-parameter language model by Microsoft
    image image image image
  350. UL2: unifying language learning paradigms
    image image image image
  351. Understanding convolutional neural networks with a mathematical model
    image image
  352. Understanding disentangling in β-VAE
    image image image image
  353. Understanding the Open Pre-Trained Transformers (OPT) library
    image image image
  354. Unit tests for stochastic optimization
    image image image
  355. Universal language model fine-tuning for text classification
    image image image
  356. Unlimiformer: long-range transformers with unlimited length input
    image image image image
  357. Unpaired image-to-image translation using cycle-consistent adversarial networks
    image image image image
  358. Unsupervised machine translation using monolingual corpora only
    image image image image
  359. Unsupervised representation learning by predicting image rotations
    image image image
  360. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, the world’s largest and most powerful generative language model
    image image image image image
  361. Variational inference using implicit distributions
    image image image
  362. Variational inference with latent space quantization for adversarial resilience
    image image image image image
  363. Variational learning for unsupervised knowledge grounded dialogs
    image image image image
  364. Variational lossy autoencoder
    image image image
  365. Vector-quantized input-contextualized soft prompts for natural language understanding
    image image image
  366. VEEGAN: reducing mode collapse in GANs using implicit variational learning
    image image image image
  367. Very deep convolutional networks for large-scale image recognition
    image image image
  368. Visual instruction tuning
    image image image image
  369. Visualizing data using t-SNE
    image image image
  370. Wasserstein GAN
    image image image image
  371. wav2vec 2.0: a framework for self-supervised learning of speech representations
    image image image image
  372. Wavenet: a generative model for raw audio
    image image image
  373. WebGPT: browser-assisted question-answering with human feedback
    image image image image
  374. What language model to train if you have one million GPU hours?
    image image image image image
  375. Will GPT-4 run DOOM?
    image image image
  376. Word translation without parallel data
    image image image
  377. Writing CUDA kernels for PyTorch
    image image
  378. Yandex publishes YaLM 100B. It’s the largest GPT-like neural network in open source
    image image image image
  379. You only cache once: decoder-decoder architectures for language models
    image image
  380. You only look once: unified, real-time object detection
    image image image
  381. ZeRO & DeepSpeed: new system optimizations enable training models with over 100 billion parameters
    image image image image
  382. ZeRO++: Extremely efficient collective communication for giant model training
    image image image image image
  383. ZeRO-2 & DeepSpeed: shattering barriers of deep learning speed & scale
    image image image image
  384. ZeRO-Infinity: breaking the GPU memory wall for extreme scale deep learning
    image image image image image
  385. Zero-shot text-to-image generation
    image image image image
  386. ZeRO: memory optimizations toward training trillion parameter models
    image image image image image
  387. ZeroQuant: efficient and affordable post-training quantization for large-scale transformers
    image image image
  388. β-VAE: learning basic visual concepts with a constrained variational framework
    image image image
  389. 🍷 FineWeb: decanting the web for the finest text data at scale
    image image image

Calculus

  1. Calculus of variations
    image
  2. Thomas' calculus
    image

Computer Architecture

  1. Accelerated computing with a reconfigurable dataflow architecture
    image image image
  2. Computer architecture: a quantitative approach
    image
  3. Computer organization and design ARM edition: the hardware software interface
    image
  4. Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors
    image image image image
  5. Improving DRAM performance by parallelizing refreshes with accesses
    image image image image image
  6. Memory performance attacks: denial of memory service in multi-core systems
    image image image image
  7. Memory scaling: a systems architecture perspective
    image image image
  8. Millicode in an IBM zSeries processor
    image image
  9. MTIA v1: Meta's first-generation AI inference accelerator
    image image image
  10. RAIDR: Retention-Aware Intelligent DRAM Refresh
    image image image
  11. Stall-time fair memory access scheduling for chip multiprocessors
    image image image

Computer Graphics

  1. Principles of traditional animation applied to 3D computer animation
    image image image

Data Structures and Algorithms

  1. Data structures and algorithms in Java
    image
  2. Introduction to algorithms
    image

Digital Electronics

  1. Digital design: with an introduction to the Verilog HDL
    image

Graph Theory

  1. Introduction to graph theory
    image

Information Theory

  1. Elements of information theory
    image
  2. Error detecting and error correcting codes
    image image image image

Linear Algebra

  1. Linear algebra and its applications
    image
  2. Matrix analysis and applied linear algebra
    image
  3. The matrix cookbook
    image

Measure Theory

  1. Measure theory
    image

Optimization Theory

  1. Convex Optimization
    image
  2. Distributed optimization and statistical learning via the alternating direction method of multipliers
    image

Probability and Stochastic Processes

  1. Introduction to probability and stochastic processes with applications
    image

Quantum Computing

  1. A fast quantum mechanical algorithm for database search
    image image image
  2. A single quantum cannot be cloned
    image image
  3. Can quantum-mechanical description of physical reality be considered complete
    image image
  4. Image recognition with an adiabatic quantum computer I. mapping to quadratic unconstrained binary optimization
    image image image image
  5. Integer optimization toolbox (minimizing polynomials over integer lattices using quantum annealing)
    image
  6. Limits on parallel speedup for classical Ising model solvers
    image
  7. Partitioning optimization problems for hybrid classical/quantum execution
    image
  8. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer
    image image image
  9. Probabilistic cloning and identification of linearly independent quantum states
    image image image
  10. Programming with D-Wave: map coloring problem
    image
  11. Quantum computation and quantum information
    image
  12. Quantum computing: a gentle introduction
    image
  13. Quantum performance evaluation: a short reading list
    image
  14. Quantum theory, the Church-Turing principle and the universal quantum computer
    image image image
  15. Rapid solution of problems by quantum computation
    image image image
  16. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels
    image image image

Signal Processing

  1. Discrete-time signal processing
    image
  2. Foundations of Signal Processing
    image
  3. Signals and systems
    image
  4. Understanding digital signal processing
    image