VAE for Image Generation
Variational AutoEncoder - Keras implementation on mnist and cifar10 datasets
Dependencies
- keras
- tensorflow / theano (current implementation is according to tensorflow. It can be used with theano with few changes in code)
- numpy, matplotlib, scipy
implementation Details
code is highly inspired from keras examples of vae :  ,
,

(source files contains some code duplication)
MNIST
- images are flatten out to treat them as 1D vectors
- encoder and decoder - both have normal neural network architecture
| network architecture | 
|  | 

- it trains vae model according to the hyperparameters defined in  
- entire vae model, encoder and decoder is stored as keras models in  directory as directory asld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_<vae/encoder/decoder>.h5where<latent_dim>is number of latent dimensions,<intermediate_dim>is number of neurons in hidden layer and<epochs>is number of training epochs
- after training, the saved model can be used to analyse the latent distribution and to generate new images
- it also stores the training history in ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_history.pkl

- it is only for 2 dimensional latent space
- it loads trained model according to the hyperparameters defined in mnist_params.py
- it displays the latent space distribution and then generates the images according to user input of latent variables (see the code as it is almost self-explanatory)
- it can also generate images from latent vectors randomly sampled from 2D latent space (comment out the user input lines) and display them in a grid

- it is same as mnist_2d_latent_space_and_generate.pybut it is for 3d latent space

- it loads trained model according to the hyperparameters defined in mnist_params.py
- if latent space is either 2D or 3D, it displays it
- it displays a grid of images generated from randomly sampled latent vectors
results
2D latent space
| latent space | uniform sampling | 
|  |  | 
3D latent space

3D latent space results
| uniform sampling | random sampling | 
|  |  | 
- more results are in  directory directory
CIFAR10
- images are treated as 2D input
- encoder has the architecture of convolutional neural network and decoder has the architecture of deconvolutional network
- network architecture for encoder and decoder are as follows
| encoder | 
|  | 
| decoder | 
|  | 
 ,
 , 
implementation structure is same as mnist files
result - latent dimensions 16
| 25 epochs | 50 epochs | 75 epochs | 
|  |  |  | 
| 600 epochs | 
|  | 
CALTECH101
- caltech101_<sz>_train.pyand- caltech101_<sz>_generate.py(where- szis the size of input image - here the training was done for two sizes - 92*92 and 128*128) are same as cifar10 dataset files
- as the image size is large, more computation power is needed to train the model
- results obtained with less training are qualitatively not good
- in datasetdirectory, is provided to preprocess the dataset is provided to preprocess the dataset