NormAE Batch size variable

Hello, I noticed you have an option for batch size in your code. Does this need to be specified? Is it possible to analyze studies that have batches with different sizes?

My second question is related to time to run the routine. It has take over 20 hours for the train function to run. My study included about 10,000 peaks, 3 batches and 150 samples. Is this expected? Is there anyway to speed up the code? This was using an i7 processor and 8 cores. It looks like the model is running ~1700 training iterations. Is it possible to change this?

Thanks!

Nov 05 '21 20:11 dwalke04

I'm also receiving another error when I run the training function: ImportError: dlopen(/opt/anaconda3/lib/python3.8/site-packages/scipy/linalg/_solve_toeplitz.cpython-38-darwin.so, 2): no suitable image found. Did find: /opt/anaconda3/lib/python3.8/site-packages/scipy/linalg/_solve_toeplitz.cpython-38-darwin.so: open() failed with errno=23

Nov 06 '21 13:11 dwalke04

One more question: I've managed to successfully run the code, and have my results. The batch correction seems to have run successfully, but the number of feature tables in my final results has been reduced from around 11,000 to 4,000. Is there filtering criteria that is applied during the correction? If so, how is that threshold determined/set?

Nov 08 '21 17:11 dwalke04

batch_size is a hyperparameter of training of deep neural network, which means the number of samples send to neural network to update parameters. It is not the sample size of batch.

Nov 10 '21 05:11 luyiyun

The feature size reduction is due to the data preprocessing:

remove peaks with more than 20% zeros
for each peaks, impute zeros with the half of minimum values

Nov 10 '21 05:11 luyiyun

you can use -e e1 e2 e3 to adjust the number of training iterations, e1 means the number of autoencoder pretraining, e2 means the number of discriminators pretraining, e3 means the number of adversarial training.

Nov 10 '21 05:11 luyiyun