Unexpected behaviour in train_vae
Hey guys,
So to begin, I'm using the cryodrgn version 3.4.3, but I have observed this behaviour in older versions as well. The system is a 4 GPU (RTX4000 Ada) system running on linux mint 22.1.
This is also in the context of a SPA dataset straight out of RELION.
After downsampling a 288 pix^2 particle stack to 128 pix^2 and running this command:
nohup cryodrgn train_vae 128box_downsample_particles.txt -o 128box --zdim 8 --ctf ctf.pkl --poses poses.pkl -n 50 --enc-dim 256 --enc-layers 3 --dec-dim 256 --dec-layers 3 --multigpu --max-threads 56 > 128box_initialtrain.log &
Everything looks great, I usually then exclude a bunch of ptcles that have high ||z|| and then run the the train_vae on the native pixel sampling with something like:
nohup cryodrgn train_vae 100k_ptcleset.mrcs -o fullsize_curated --zdim 8 --ctf ctf.pkl --poses poses.pkl -n 50 --enc-dim 512 --enc-layers 3 --dec-dim 512 --dec-layers 3 --multigpu --num-workers 1 --max-threads 56 --ind 128box/ind_bad.89709_particles.pkl > cryodrgn_train_vae_fullsize.log &
The issue then is that all the volumes created by the encoder have very very sharp features in the middle of the box and the map histograms look strange because of only a few very bright voxels (see attached pic, left is the strange feature in the middle of the box and right is the map at lower threshold).
So here is where it gets a bit strange. If I then try to redo train_vae on ptcles scaled to 256 pix^2 and reduce the encoder and decoder dimensions to 256, then this issue is no longer there, but then if I use the same partcles and increase the enc/dec dimensionality to 512, then it reappears....
I'm super happy to share the partcle stacks and all the meta data, as I'm working up a workshop to teach cryoDRGN here at the Monash Institute of Pharmaceutical Sciences, but I would like to get the bottom of this behaviour.
Thoughts?