8.26.24 |
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery |
SakanaAI/AI-Scientist |
8.19.24 |
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget |
SonyResearch/micro_diffusion (pending) |
8.12.24 |
Scaling and evaluating sparse autoencoders |
openai/sparse_autoencoder |
4.10.24 |
Score-Based Generative Modeling through Stochastic Differential Equations |
|
3.28.24 |
Generative Modeling by Estimating Gradients of the Data Distribution |
|
3.21.24 |
Humanoid Locomotion as Next Token Prediction |
|
3.14.24 |
TIES-Merging: Resolving Interference When Merging Models |
prateeky2806/ties-merging |
2.8.24 |
Merging Models with Fisher-Weighted Averaging |
arcee-ai/mergekit |
2.1.24 |
Averaging Weights Leads to Wider Optima and Better Generalization |
|
1.18.24 |
Hyena Hierarchy: Towards Larger Convolutional Language Models |
|
1.04.24 |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces |
state-spaces/mamba |
12.07.23 |
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning |
|
11.30.23 |
3D Gaussian Splatting for Real-Time Radiance Field Rendering |
graphdeco-inria/gaussian-splatting |
11.16.23 |
LILO: Learning Interpretable Libraries by Compressing and Documenting Code |
gabegrand/lilo |
11.09.23 |
Human-like systematic generalization through a meta-learning neural network |
brendenlake/MLC and brendenlake/MLC-ML |
9.28.23 |
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks |
huggingface/transformers/examples/research_projects/rag |
9.14.23 |
Gradient-based Adversarial Attacks against Text Transformers |
facebookresearch/text-adversarial-attack |
8.10.23 |
Reflexion: Language Agents with Verbal Reinforcement Learning |
noahshinn024/reflexion |
6.15.23 |
RWKV: Reinventing RNNs for the Transformer Era |
BlinkDL/RWKV-LM |
5.18.23 |
Toy Models of Superposition |
|
5.11.23 |
LoRA: Low-Rank Adaptation of Large Language Models |
tloen/alpaca-lora and huggingface/blog/lora |
5.04.23 |
Efficiently Modeling Long Sequences with Structured State Spaces |
HazyResearch/state-spaces |
4.06.23 |
Generating Sequences by Learning to Self-Correct |
|
3.30.23 |
The Capacity for Moral Self-Correction in Large Language Models |
|
3.23.23 |
LLaMA: Open and Efficient Foundation Language Models |
facebookresearch/llama and huggingface/llama |
3.16.23 |
Language Is Not All You Need: Aligning Perception with Language Models |
|
3.02.23 |
Guiding Pretraining in Reinforcement Learning with Large Language Models |
|
2.23.23 |
Toolformer: Language Models Can Teach Themselves to Use Tools |
|
2.16.23 |
What learning algorithm is in-context learning? Investigations with linear models |
ekinakyurek/incontext |
2.09.23 |
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation |
Tutorial |
1.26.23 |
Mastering Diverse Domains through World Models |
|
1.12.23 |
The Forward-Forward Algorithm: Some Preliminary Investigations |
|
12.08.22 |
Training language models to follow instructions with human feedback |
|
9.22.22 |
Git Re-Basin: Merging Models modulo Permutation Symmetries |
|
9.08.22 |
Transformers are Sample-Efficient World Models |
|
8.25.22 |
A Path Towards Autonomous Machine Intelligence |
|
8.18.22 |
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer |
microsoft/mup |
7.14.22 |
Learning Iterative Reasoning through Energy Minimization |
yilundu/irem_code_release |
6.16.22 |
Sharpness-Aware Minimization for Efficiently Improving Generalization |
google-research/sam |
5.26.22 |
Neural Tangent Kernel: Convergence and Generalization in Neural Networks |
|
4.28.22 |
A Modern Self-Referential Weight Matrix That Learns to Modify Itself |
IDSIA/modern-srwm |
4.14.22 |
Hierarchical Perceiver |
|
3.24.22 |
Dual Diffusion Implicit Bridges for Image-to-Image Translation |
|
3.10.22 |
Understanding Generalization through Visualizations |
wronnyhuang/gen-viz |
2.17.22 |
Divide and Contrast: Self-supervised Learning from Uncurated Data |
|
2.10.22 |
Investigating Human Priors for Playing Video Games |
rach0012/humanRL_prior_games |
1.27.22 |
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language |
pytorch/data2vec |
1.20.22 |
Consistent Video Depth Estimation |
facebookresearch/consistent_depth |
1.13.22 |
Masked Autoencoders Are Scalable Vision Learners |
|
12.02.21 |
Training Verifiers to Solve Math Word Problems |
|
11.18.21 |
(StyleGan3) Alias-Free Generative Adversarial Networks |
NVlabs/stylegan3 |
11.04.21 |
Do Vision Transformers See Like Convolutional Neural Networks? |
|
10.21.21 |
CoBERL: Contrastive BERT for Reinforcement Learning |
|
10.14.21 |
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space |
chi0tzp/WarpedGANSpace |
10.06.21 |
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow |
princeton-vl/RAFT |
9.16.21 |
Bootstrapped Meta-Learning |
|
9.09.21 |
Program Synthesis with Large Language Models |
|
8.19.21 |
Perceiver IO: A General Architecture for Structured Inputs & Outputs |
deepmind/perceiver |
8.12.21 |
Reward is enough |
|
8.05.21 |
Learning Compositional Rules via Neural Program Synthesis |
mtensor/rulesynthesis |
6.24.21 |
Thinking Like Transformers |
|
6.17.21 |
Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation |
|
6.10.21 |
Unsupervised Learning by Competing Hidden Units |
|
5.27.21 |
Pay Attention to MLPs |
|
5.20.21 |
Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards |
|
5.13.21 |
Emerging Properties in Self-Supervised Vision Transformers |
|
5.06.21 |
Implicit Neural Representations with Periodic Activation Functions |
vsitzmann/siren |
4.29.21 |
How to represent part-whole hierarchies in a neural network |
lucidrains/glom-pytorch RedRyan111/GLOM ArneBinder/GlomImpl |
4.15.21 |
Perceiver: General Perception with Iterative Attention |
|
4.01.21 |
Synthetic Returns for Long-Term Credit Assignment |
|
3.25.21 |
The Pitfalls of Simplicity Bias in Neural Networks |
|
3.18.21 |
Bootstrap your own latent: A new approach to self-supervised Learning |
|
3.11.21 |
Meta Learning Backpropagation And Improving It |
|
3.04.21 |
Taming Transformers for High-Resolution Image Synthesis |
CompVis/taming-transformers |
2.18.21 |
Pre-training without Natural Images |
hirokatsukataoka16/FractalDB-Pretrained-ResNet-PyTorch |
2.11.21 |
Revisiting Locally Supervised Learning: an Alternative to End-to-end Training |
blackfeather-wang/InfoPro-Pytorch |
2.04.21 |
Neural Power Units |
|
1.28.21 |
Representation Learning via Invariant Causal Mechanisms |
|
1.21.21 |
γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction |
JannerM/gamma-models |
1.14.21 |
Improving Generalisation for Temporal Difference Learning: The Successor Representation |
|
12.17.20 |
Learning Associative Inference Using Fast Weight Memory |
|
|
Hopfield Networks cycle ends |
|
12.10.20 |
Hopfield Networks is All You Need |
ml-jku/hopfield-layers |
12.03.20 |
On a model of associative memory with huge storage capacity |
|
11.19.20 |
Dense Associative Memory for Pattern Recognition |
|
11.12.20 |
Neural Networks and Physical Systems with Emergent Collective Computational Abilities (= "the Hopfield Networks paper") |
|
|
Hopfield Networks cycle of papers - from the original paper on Hopfield networks to "Hopfield Networks is All You Need" |
|
11.05.20 |
Training Generative Adversarial Networks with Limited Data |
NVlabs/stylegan2-ada |
10.29.20 |
Memories from patterns: Attractor and integrator networks in the brain |
|
10.15.20 |
Entities as Experts: Sparse Memory Access with Entity Supervision |
|
10.08.20 |
A Primer in BERTology: What we know about how BERT works |
|
10.01.20 |
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners |
timoschick/pet |
9.24.20 |
End-to-End Object Detection with Transformers |
facebookresearch/detr |
9.17.20 |
Gated Linear Networks |
|
7.23.20 |
A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning |
|
7.02.20 |
DreamCoder: Building interpretable hierarchical knowledge representations with wake-sleep Bayesian program learning |
ellisk42/ec |
6.18.20 |
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver |
locuslab/SATNet |
6.4.20 |
Adaptive Attention Span in Transformers |
|
5.28.20 |
Complexity control by gradient descent in deep networks |
|
5.21.20 |
What Can Learned Intrinsic Rewards Capture? |
|
5.14.20 |
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction |
|
5.7.20 |
Write, Execute, Assess: Program Synthesis With a REPL |
flxsosa/ProgramSearch |
4.23.20 |
Graph Representations for Higher-Order Logic and Theorem Proving |
|
4.16.20 |
Mathematical Reasoning in Latent Space |
|
4.9.20 |
MEMO: A Deep Network for Flexible Combination of Episodic Memories |
|
4.2.20 |
Creating High Resolution Images with a Latent Adversarial Generator |
|
3.26.20 |
Invertible Residual Networks |
|
3.5.20 |
Value-driven Hindsight Modelling |
|
2.27.20 |
Analyzing and Improving the Image Quality of StyleGAN |
|
2.13.20 |
Axiomatic Attribution for Deep Networks |
|
2.6.20 |
Automated curricula through setter-solver interactions |
|
1.30.20 |
Protein structure prediction ... |
deepmind |
1.23.20 |
Putting An End to End-to-End: Gradient-Isolated Learning of Representations |
|
1.16.20 |
Normalizing Flows: An Introduction and Review of Current Methods |
|
12.19.19 |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model |
|
12.5.19 |
On the Measure of Intelligence |
|
11.21.19 |
Understanding the Neural Tangent Kernel |
rajatvd |
11.14.19 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding |
|
11.7.19 |
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction |
|
10.31.19 |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context |
|
10.24.19 |
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting |
|
10.17.19 |
Unsupervised Doodling and Painting with Improved SPIRAL |
|
10.10.19 |
Adversarial Robustness as a Prior for Learned Representations |
MadryLab |
10.3.19 |
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks |
|
9.26.19 |
Image Transformer |
|
9.19.19 |
Generating Diverse High-Fidelity Images with VQ-VAE-2 |
|
9.12.19 |
Neural Discrete Representation Learning |
|
9.5.19 |
Neural Text Generation with Unlikelihood Training |
|
8.29.19 |
Learning Representations by Maximizing Mutual Information Across Views |
|
break |
switch from Tuesdays to Thursdays after the break |
|
6.11.19 |
BERT Rediscovers the Classical NLP Pipeline |
|
6.4.19 |
Semantic Visual Localization |
|
5.28.19 |
AlgoNet: C^∞ Smooth Algorithmic Neural Networks |
|
5.14.19 |
Unsupervised Data Augmentation for Consistency Training |
|
4.30.19 |
Augmented Neural ODEs |
|
4.9.19 |
Wasserstein Dependency Measure for Representation Learning |
|
4.2.19 |
Leveraging Knowledge Bases in LSTMs for Improving Machine Reading |
|
3.26.19 |
Meta Particle Flow for Sequential Bayesian Inference |
|
3.19.19 |
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms |
|
3.12.19 |
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks |
|
2.26.19 |
Language Models are Unsupervised Multitask Learners |
openai |
2.19.19 |
Learning to Understand Goal Specifications by Modelling Reward |
|
1.29.19 |
GamePad: A Learning Environment for Theorem Proving |
|
1.15.19 |
Matrix capsules with EM routing |
|
12.4.18 |
Optimizing Agent Behavior over Long Time Scales by Transporting Value |
|
11.27.18 |
Embedding Logical Queries on Knowledge Graphs |
williamleif |
11.20.18 |
Large-Scale Study of Curiosity-Driven Learning |
openai |
11.13.18 |
Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding |
nke001 |
11.6.18 |
Generalizing Hamiltonian Monte Carlo with Neural Networks |
brain-research |
10.23.18 |
A Conceptual Introduction to Hamiltonian Monte Carlo |
|
10.16.18 |
MaskGAN: Better Text Generation via Filling in the ... |
|
10.9.18 |
Large Scale GAN Training for High Fidelity Natural Image Synthesis |
|
10.2.18 |
Improving Variational Inference with Inverse Autoregressive Flow |
|
9.25.18 |
Artificial Intelligence - The Revolution Hasn’t Happened Yet |
|
9.18.18 |
Learning deep representations by mutual information estimation and maximization |
|
9.11.18 |
The Variational Homoencoder: Learning to learn high capacity generative models from few examples |
insperatum |
9.4.18 |
Towards Conceptual Compression |
geosada |
8.28.18 |
Vector-based navigation using grid-like representations in artificial agents |
deepmind |
|
break in maintaining this file; filled on April 10, 2020 |
|
------ |
------------- |
------------- |
8.21.18 |
Universal Transformers |
tensorflow |
8.14.18 |
Neural Arithmetic Logic Units |
gautam1858 |
8.7.18 |
Neural Scene Representation and Rendering |
|
7.31.18 |
Measuring Abstract Reasoning in Neural Networks |
|
6.26.18 |
Improving Language Understanding by Generative Pre-Training |
openai |
6.19.18 |
Associative Compression Networks for Representation Learning |
|
6.12.18 |
On Characterizing the Capacity of Neural Networks using Algebraic Topology |
|
6.5.18 |
Causal Effect Inference with Deep Latent-Variable Models |
AMLab |
5.29.18 |
ML beyond Curve Fitting |
|
5.22.18 |
Synthesizing Programs for Images using Reinforced Adversarial Learning |
|
5.15.18 |
TensorFlow Overview |
r1.8 |
5.8.18 |
Compositional Attention Networks for Machine Reasoning |
stanfordnlp |
4.24.18 |
The Annotated Transformer |
|
4.3.18 |
How Developers Iterate on Machine Learning Workflows |
|
3.27.18 |
Faster R-CNN: Towards Real-Time Object,Detection with Region Proposal Networks |
|
3.20.18 |
Attention Is All You Need |
tensor2tensor |
3.6.18 |
Generating Wikipedia by Summarizing Long Sequences |
wikisum, per this gist |
2.27.18 |
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks |
StackGAN-v2 |
2.20.18 |
Information Dropout |
InformationDropout, official implementation |
2.13.18 |
Nested LSTMs |
Nested-LSTM |
2.6.18 |
Deep vs. Shallow Networks: An Approximation Theory Perspective |
|
1.30.18 |
The Case for Learned Index Structures |
|
1.23.18 |
Visualizing The Loss Landscape Of Neural Nets |
|
1.16.18 |
Go for a Walk and Arrive at the Answer, RelNet: End-to-End Modeling of Entities & Relations |
|
1.9.18 |
Intro to Coq |
|
12.12.17 |
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks |
(ChainsofReasoning) |
12.5.17 |
Stochastic Neural Networks for Hierarchical Reinforcement Learning |
snn4hrl |
11.28.17 |
Emergent Complexity via Multi-Agent Competition (blog post) |
multiagent-competition |
11.14.17 |
Mastering the game of Go without human knowledge |
|
11.7.17 |
Meta-Learning with Memory-Augmented Neural Networks |
ntm-meta-learning |
10.24.17 |
Poincaré Embeddings for Learning Hierarchical Representations |
poincare_embeddings |
10.17.17 |
What does Attention in Neural Machine Translation Pay Attention to? |
|
10.10.17 |
Zero-Shot Learning Through Cross-Modal Transfer |
zslearning |
9.26.17 |
Variational Boosting: Iteratively Refining Posterior Approximations |
vboost |
9.19.17 |
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks |
cbfinn |
9.12.17 |
Neuroscience-inspired AI |
|
9.5.17 |
Recurrent Dropout Without Memory Loss |
rnn_cell_mulint_modern.py |
8.29.17 |
Deep Transfer Learning with Joint Adaptation Networks |
jmmd.{cpp,hpp} |
8.22.17 |
Designing Neural Network Architectures using Reinforcement Learning |
metaqnn |
8.15.17 |
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences |
plstm |
8.8.17 |
Hyper Networks |
otoro blog |
8.1.17 |
Full-Capacity Unitary Recurrent Neural Networks |
complex_RNN, urnn |
7.25.17 |
Decoupled Neural Interfaces using Synthetic Gradients & follow-up |
dni.pytorch |
7.18.17 |
A simple neural network module for relational reasoning |
relation-network |
7.11.17 |
Speaker diarization using deep neural network embeddings |
|
6.20.17 |
Neural Episodic Control |
PFCM |
6.13.17 |
Lie-Access Neural Turing Machines |
harvardnlp |
6.6.17 |
Artistic style transfer for videos |
artistic video |
5.30.17 |
High-Dimensional Continuous Control Using Generalized Advantage Estimation |
modular_rl |
5.23.17 |
Emergence of Grounded Compositional Language in Multi-Agent Populations |
|
5.16.17 |
Trust Region Policy Optimization |
modular_rl |
5.9.17 |
Improved Training of Wasserstein GANs |
code |
5.4.17 |
Using Fast Weights to Attend to the Recent Past |
|
4.25.17 |
Strategic Attentive Writer for Learning Macro-Actions |
|
4.18.17 |
Massive Exploration of Neural Machine Translation Architectures |
|
4.4.17 |
End to End Learning for Self-Driving Cars |
|
3.28.17 |
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning |
|
3.21.17 |
Image-to-Image Translation with Conditional Adversarial Networks |
|
3.7.17 |
Neural Programmer Interpreters |
|
2.14.17 |
Wasserstein GAN |
|
2.7.17 |
Towards Principled Methods for Training GANs |
|
1.31.17 |
Mastering the Game of Go with Deep Networks |
|
1.24.17 |
Understanding Deep Learning Requires Rethinking Generalization |
|
1.17.17 |
Neural Semantic Encoders |
|
12.21.16 |
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks |
|
12.14.16 |
Key-Value Memory Networks for Directly Reading Documents |
|
12.7.16 |
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets |
|