opusDSD icon indicating copy to clipboard operation
opusDSD copied to clipboard

Image size should be even error

Open jamesmkrieger opened this issue 2 years ago • 28 comments

Describe the bug Running preprocessing without --no-keep-real makes particles with 65 pixels instead of 64, and then train_cv for opusDSD complains that the pixel size isn't even.

Traceback (most recent call last):
  File "/home/jkrieger/software/miniconda/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/jkrieger/software/miniconda/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 1157, in <module>
    main(args)
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 725, in main
    data = dataset.LazyMRCData(args.particles, norm=args.norm,
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/dataset.py", line 51, in __init__
    assert ny % 2 == 0, "Image size must be even"
AssertionError: Image size must be even

Looking at the particles in Python confirms the problem:

In [1]: particles='/home/jkrieger/ScipionUserData/projects/TestOpusDsd/Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt'

In [2]: from cryodrgn import dataset
Installed qt5 event loop hook.

In [3]: norm=None

In [4]: real_data=True

In [5]: invert_data=True

In [6]: ind=None

In [7]: use_real=True # it actually doesn't matter what value this takes as it isn't used in the code

In [8]: window=False

In [9]: relion31=True

In [10]: data=None

In [11]: datadir=None

In [12]: window_r=.85

In [13]: in_mem=True

In [14]: notinmem=False

In [15]:             data = dataset.LazyMRCData(particles, norm=norm,
    ...:                                        real_data=real_data, invert_data=invert_data,
    ...:                                        ind=ind, keepreal=use_real, window=False,
    ...:                                        datadir=datadir, relion31=relion31, window_r=window_r, in_mem=(not notinmem))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[15], line 1
----> 1 data = dataset.LazyMRCData(particles, norm=norm,
      2                            real_data=real_data, invert_data=invert_data,
      3                            ind=ind, keepreal=use_real, window=False,
      4                            datadir=datadir, relion31=relion31, window_r=window_r, in_mem=(not notinmem))

File ~/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/dataset.py:51, in LazyMRCData.__init__(self, mrcfile, norm, real_data, keepreal, invert_data, ind, window, datadir, relion31, window_r, in_mem)
     49 ny, nx = particles[0].get().shape
     50 assert ny == nx, "Images must be square"
---> 51 assert ny % 2 == 0, "Image size must be even"
     52 log('Loaded {} {}x{} images'.format(N, ny, nx))
     53 self.particles = particles

AssertionError: Image size must be even

In [18]: mrcfile=particles

In [19]: particles = dataset.load_particles(mrcfile, True, datadir=datadir, relion31=relion31)

In [20]: type(particles)
Out[20]: list

In [21]:         N = len(particles)
    ...:         ny, nx = particles[0].get().shape

In [22]: ny
Out[22]: 65

In [23]: nx
Out[23]: 65

To Reproduce

CUDA_VISIBLE_DEVICES=0
python -m cryodrgn.commands.preprocess Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.mrcs  -D 64 --window-r 0.85 --max-threads 16  --relion31  -b 5000
python -m cryodrgn.commands.parse_pose_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl  --relion31  -D 64  --Apix 3.54
python -m cryodrgn.commands.parse_pose_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl  --relion31  -D 64  --Apix 3.54
python -m cryodrgn.commands.parse_ctf_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl  --relion31  -D 64  --Apix 3.54  --kv 300.0  --cs 2.7  -w 0.1  --ps 0
python -m cryodrgn.commands.train_cv Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt --poses Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl --ctf Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl --zdim 2 -o Runs/000237_OpusDsdProtTrain/output  -n 3 --preprocessed --max-threads 1  --enc-layers 3 --enc-dim 1024 --dec-layers 3 --dec-dim 1024 --lazy-single  --pe-type vanilla  --encode-mode grad  --template-type conv  -b 5000 --lr 0.00012  --beta-control 1.0  --beta cos  --downfrac 0.5  --valfrac 0.2  --lamb 1.0  --bfactor 4.0  --templateres 192

Expected behavior I'm not actually sure. Somehow, I'd expect preprocess to work upstream of train_cv but maybe not with the arguments that I used as I just saw that it's not actually one of the recommended steps on the README. I suppose the convolutional network means that we don't have such a great need to downsample the map for efficiency anymore.

I think there is probably a problem that --no-keep-real doesn't do what it's supposed to (see #4, which I closed because I wasn't sure this is the right answer).

Another thing that I'd expect is that the keepreal argument does something in LazyMRCData and handles whether there is a check for pixel size needing to be even.

Additional context

  • You should probably know that I am making a plugin for opusDSD within the Scipion workflow engine, which can be found at https://github.com/scipion-em/scipion-em-opusdsd. This allows opusDSD to be run from a GUI and included in pipelines. If you would like to meet and see what I'm doing and be involved, you are very welcome to.
  • These results are from using the test dataset that was also used for CryoDRGN, which comes from a refinement in a Relion tutorial dataset and contains 1799 particles.

jamesmkrieger avatar Nov 16 '23 20:11 jamesmkrieger

I also have a warning from cryodrgn that we need to have a box size divisible by 8. Is this still true for opusdsd?

jamesmkrieger avatar Nov 16 '23 20:11 jamesmkrieger

I can actually note that I still get this error if I don't do any downsampling

jamesmkrieger avatar Nov 16 '23 21:11 jamesmkrieger

I also have a warning from cryodrgn that we need to have a box size divisible by 8. Is this still true for opusdsd?

This is not true for opusDSD since I didn't implement apex acceleration. CryoDRGN implements an apex.amp acceleration which requires that kind of box size if you enable it during training.

alncat avatar Nov 17 '23 00:11 alncat

Describe the bug Running preprocessing without --no-keep-real makes particles with 65 pixels instead of 64, and then train_cv for opusDSD complains that the pixel size isn't even.

Traceback (most recent call last):
  File "/home/jkrieger/software/miniconda/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/jkrieger/software/miniconda/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 1157, in <module>
    main(args)
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 725, in main
    data = dataset.LazyMRCData(args.particles, norm=args.norm,
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/dataset.py", line 51, in __init__
    assert ny % 2 == 0, "Image size must be even"
AssertionError: Image size must be even

Looking at the particles in Python confirms the problem:

In [1]: particles='/home/jkrieger/ScipionUserData/projects/TestOpusDsd/Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt'

In [2]: from cryodrgn import dataset
Installed qt5 event loop hook.

In [3]: norm=None

In [4]: real_data=True

In [5]: invert_data=True

In [6]: ind=None

In [7]: use_real=True # it actually doesn't matter what value this takes as it isn't used in the code

In [8]: window=False

In [9]: relion31=True

In [10]: data=None

In [11]: datadir=None

In [12]: window_r=.85

In [13]: in_mem=True

In [14]: notinmem=False

In [15]:             data = dataset.LazyMRCData(particles, norm=norm,
    ...:                                        real_data=real_data, invert_data=invert_data,
    ...:                                        ind=ind, keepreal=use_real, window=False,
    ...:                                        datadir=datadir, relion31=relion31, window_r=window_r, in_mem=(not notinmem))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[15], line 1
----> 1 data = dataset.LazyMRCData(particles, norm=norm,
      2                            real_data=real_data, invert_data=invert_data,
      3                            ind=ind, keepreal=use_real, window=False,
      4                            datadir=datadir, relion31=relion31, window_r=window_r, in_mem=(not notinmem))

File ~/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/dataset.py:51, in LazyMRCData.__init__(self, mrcfile, norm, real_data, keepreal, invert_data, ind, window, datadir, relion31, window_r, in_mem)
     49 ny, nx = particles[0].get().shape
     50 assert ny == nx, "Images must be square"
---> 51 assert ny % 2 == 0, "Image size must be even"
     52 log('Loaded {} {}x{} images'.format(N, ny, nx))
     53 self.particles = particles

AssertionError: Image size must be even

In [18]: mrcfile=particles

In [19]: particles = dataset.load_particles(mrcfile, True, datadir=datadir, relion31=relion31)

In [20]: type(particles)
Out[20]: list

In [21]:         N = len(particles)
    ...:         ny, nx = particles[0].get().shape

In [22]: ny
Out[22]: 65

In [23]: nx
Out[23]: 65

To Reproduce

CUDA_VISIBLE_DEVICES=0
python -m cryodrgn.commands.preprocess Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.mrcs  -D 64 --window-r 0.85 --max-threads 16  --relion31  -b 5000
python -m cryodrgn.commands.parse_pose_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl  --relion31  -D 64  --Apix 3.54
python -m cryodrgn.commands.parse_pose_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl  --relion31  -D 64  --Apix 3.54
python -m cryodrgn.commands.parse_ctf_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl  --relion31  -D 64  --Apix 3.54  --kv 300.0  --cs 2.7  -w 0.1  --ps 0
python -m cryodrgn.commands.train_cv Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt --poses Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl --ctf Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl --zdim 2 -o Runs/000237_OpusDsdProtTrain/output  -n 3 --preprocessed --max-threads 1  --enc-layers 3 --enc-dim 1024 --dec-layers 3 --dec-dim 1024 --lazy-single  --pe-type vanilla  --encode-mode grad  --template-type conv  -b 5000 --lr 0.00012  --beta-control 1.0  --beta cos  --downfrac 0.5  --valfrac 0.2  --lamb 1.0  --bfactor 4.0  --templateres 192

Expected behavior I'm not actually sure. Somehow, I'd expect preprocess to work upstream of train_cv but maybe not with the arguments that I used as I just saw that it's not actually one of the recommended steps on the README. I suppose the convolutional network means that we don't have such a great need to downsample the map for efficiency anymore.

I think there is probably a problem that --no-keep-real doesn't do what it's supposed to (see #4, which I closed because I wasn't sure this is the right answer).

Another thing that I'd expect is that the keepreal argument does something in LazyMRCData and handles whether there is a check for pixel size needing to be even.

Additional context

  • You should probably know that I am making a plugin for opusDSD within the Scipion workflow engine, which can be found at https://github.com/scipion-em/scipion-em-opusdsd. This allows opusDSD to be run from a GUI and included in pipelines. If you would like to meet and see what I'm doing and be involved, you are very welcome to.
  • These results are from using the test dataset that was also used for CryoDRGN, which comes from a refinement in a Relion tutorial dataset and contains 1799 particles.

James, thank you very much! I will look into this issue!

alncat avatar Nov 17 '23 00:11 alncat

Describe the bug Running preprocessing without --no-keep-real makes particles with 65 pixels instead of 64, and then train_cv for opusDSD complains that the pixel size isn't even.

Traceback (most recent call last):
  File "/home/jkrieger/software/miniconda/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/jkrieger/software/miniconda/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 1157, in <module>
    main(args)
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 725, in main
    data = dataset.LazyMRCData(args.particles, norm=args.norm,
  File "/home/jkrieger/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/dataset.py", line 51, in __init__
    assert ny % 2 == 0, "Image size must be even"
AssertionError: Image size must be even

Looking at the particles in Python confirms the problem:

In [1]: particles='/home/jkrieger/ScipionUserData/projects/TestOpusDsd/Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt'

In [2]: from cryodrgn import dataset
Installed qt5 event loop hook.

In [3]: norm=None

In [4]: real_data=True

In [5]: invert_data=True

In [6]: ind=None

In [7]: use_real=True # it actually doesn't matter what value this takes as it isn't used in the code

In [8]: window=False

In [9]: relion31=True

In [10]: data=None

In [11]: datadir=None

In [12]: window_r=.85

In [13]: in_mem=True

In [14]: notinmem=False

In [15]:             data = dataset.LazyMRCData(particles, norm=norm,
    ...:                                        real_data=real_data, invert_data=invert_data,
    ...:                                        ind=ind, keepreal=use_real, window=False,
    ...:                                        datadir=datadir, relion31=relion31, window_r=window_r, in_mem=(not notinmem))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[15], line 1
----> 1 data = dataset.LazyMRCData(particles, norm=norm,
      2                            real_data=real_data, invert_data=invert_data,
      3                            ind=ind, keepreal=use_real, window=False,
      4                            datadir=datadir, relion31=relion31, window_r=window_r, in_mem=(not notinmem))

File ~/software/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/dataset.py:51, in LazyMRCData.__init__(self, mrcfile, norm, real_data, keepreal, invert_data, ind, window, datadir, relion31, window_r, in_mem)
     49 ny, nx = particles[0].get().shape
     50 assert ny == nx, "Images must be square"
---> 51 assert ny % 2 == 0, "Image size must be even"
     52 log('Loaded {} {}x{} images'.format(N, ny, nx))
     53 self.particles = particles

AssertionError: Image size must be even

In [18]: mrcfile=particles

In [19]: particles = dataset.load_particles(mrcfile, True, datadir=datadir, relion31=relion31)

In [20]: type(particles)
Out[20]: list

In [21]:         N = len(particles)
    ...:         ny, nx = particles[0].get().shape

In [22]: ny
Out[22]: 65

In [23]: nx
Out[23]: 65

To Reproduce

CUDA_VISIBLE_DEVICES=0
python -m cryodrgn.commands.preprocess Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.mrcs  -D 64 --window-r 0.85 --max-threads 16  --relion31  -b 5000
python -m cryodrgn.commands.parse_pose_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl  --relion31  -D 64  --Apix 3.54
python -m cryodrgn.commands.parse_pose_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl  --relion31  -D 64  --Apix 3.54
python -m cryodrgn.commands.parse_ctf_star Runs/000186_OpusDsdProtPreprocess/extra/input_particles.star  -o Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl  --relion31  -D 64  --Apix 3.54  --kv 300.0  --cs 2.7  -w 0.1  --ps 0
python -m cryodrgn.commands.train_cv Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt --poses Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl --ctf Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl --zdim 2 -o Runs/000237_OpusDsdProtTrain/output  -n 3 --preprocessed --max-threads 1  --enc-layers 3 --enc-dim 1024 --dec-layers 3 --dec-dim 1024 --lazy-single  --pe-type vanilla  --encode-mode grad  --template-type conv  -b 5000 --lr 0.00012  --beta-control 1.0  --beta cos  --downfrac 0.5  --valfrac 0.2  --lamb 1.0  --bfactor 4.0  --templateres 192

Expected behavior I'm not actually sure. Somehow, I'd expect preprocess to work upstream of train_cv but maybe not with the arguments that I used as I just saw that it's not actually one of the recommended steps on the README. I suppose the convolutional network means that we don't have such a great need to downsample the map for efficiency anymore.

I think there is probably a problem that --no-keep-real doesn't do what it's supposed to (see #4, which I closed because I wasn't sure this is the right answer).

Another thing that I'd expect is that the keepreal argument does something in LazyMRCData and handles whether there is a check for pixel size needing to be even.

Additional context

  • You should probably know that I am making a plugin for opusDSD within the Scipion workflow engine, which can be found at https://github.com/scipion-em/scipion-em-opusdsd. This allows opusDSD to be run from a GUI and included in pipelines. If you would like to meet and see what I'm doing and be involved, you are very welcome to.
  • These results are from using the test dataset that was also used for CryoDRGN, which comes from a refinement in a Relion tutorial dataset and contains 1799 particles.

The preprocess routine in cryoDRGN might not work with opusDSD without special attention. OpusDSD takes the image stacks as input directly, and the image are in real space as it is. In contrast, cryoDRGN needs to take the fourier transform of the image, so they implement a preprocess routine to do the FFT in advance. Therefore, you need to make sure that the images downsampled by the preprocess of cryoDRGN is still in real space (I will check the no-keep-real argument). To downsample the image stack, I usually use the relion_preprocess from RELION, or specify a downsample argument during training, like --downsample 0.5, which will downsample the image to half of the original dimension. To go under the hood, James, you may note that there is a "data_augmentation" function in the train_cv.py, which handles the downsample, shift, and blurring, operations.

alncat avatar Nov 17 '23 01:11 alncat

O, can you check the content of 'particles.64.ft.txt', is it pointing to particles.64.mrcs? You can also check the header of mrcs using IMOD's header command, like 'header particles.64.mrcs' .

alncat avatar Nov 17 '23 03:11 alncat

https://github.com/alncat/opusDSD/blob/e931522987ed2b8fc8914943768d5a7452189493/cryodrgn/commands/preprocess.py#L107C21-L107C21 looks like no-keep-real will enable HT transform on the image, so the output is D+1

alncat avatar Nov 17 '23 04:11 alncat

Thanks for all the responses. That’s really helpful!

I’ll have a look at particles.64.ft.txt and see. I’d guess the images are already in Fourier space for that

jamesmkrieger avatar Nov 17 '23 07:11 jamesmkrieger

Yes, inside particles.64.ft.txt it says particles.64.0.ft.mrcs

jamesmkrieger avatar Nov 17 '23 08:11 jamesmkrieger

Yes, inside particles.64.ft.txt it says particles.64.0.ft.mrcs

Sorry for the late reply, I am traveling last weekend. O, you can then try loading particles.64.mrcs directly to see if the images are in real space and in even size. Regarding the training command 'python -m cryodrgn.commands.train_cv Runs/000186_OpusDsdProtPreprocess/output_particles/particles.64.ft.txt --poses Runs/000186_OpusDsdProtPreprocess/output_particles/poses.pkl --ctf Runs/000186_OpusDsdProtPreprocess/output_particles/ctfs.pkl --zdim 2 -o Runs/000237_OpusDsdProtTrain/output -n 3 --preprocessed --max-threads 1 --enc-layers 3 --enc-dim 1024 --dec-layers 3 --dec-dim 1024 --lazy-single --pe-type vanilla --encode-mode grad --template-type conv -b 5000 --lr 0.00012 --beta-control 1.0 --beta cos --downfrac 0.5 --valfrac 0.2 --lamb 1.0 --bfactor 4.0 --templateres 192'. --preprocssed --max-threads 1 --enc-layers 3 --enc-dim 1024 --dec-layers 3 --dec-dim 1024' should be dropped as they are related to cryoDRGN. 'b' represents the batch size, 5000 might be too large. You can set it to a number around 20 that can fit into the gpu memory (depends on your hardware). If you have multiple gpu, you can then try 'multigpu' and 'num-gpus'. Since 64 is a very small size, downfrac can be set 1.0 (which means no downsampling), templateres can be set to smaller size like 128. Finally, you can try to make a mask using conesus model, following the link https://relion.readthedocs.io/en/release-3.1/SPA_tutorial/Mask.html . You can also try '--plot' option, which will show some intermediate results interactively during training.

alncat avatar Nov 20 '23 02:11 alncat

Thanks for the comments. No worry about the delay.

I think that particles.64.0.ft.mrcs is probably in Fourier space and not even size.

I decided to upsample the particles to 130 now and use downfrac 1.0 because there is an error that it has to be bigger than 128 and that's the first bigger even number. This is still for testing quickly at the moment and it is giving results now, which I think look ok considering how small the data set is.

I'll try removing those parts especially --preprocessed --max-threads 1 and change the batch size. Isn't good to have an option for controlling layers and dim for enc and dec?

Many thanks again

jamesmkrieger avatar Nov 20 '23 10:11 jamesmkrieger

https://github.com/alncat/opusDSD/blob/86bed17a235c3a166ca03b51aa75963a3f81c63e/cryodrgn/commands/train_cv.py#L948 This line can be deleted since this size limit no longer holds. The intermediate tensors will be resampled to 12^3 in encoder, https://github.com/alncat/opusDSD/blob/86bed17a235c3a166ca03b51aa75963a3f81c63e/cryodrgn/models.py#L663 . Hence, the encoder works with any size now 😺. You can try to delete that line and test on 64x64 images. It will be great if we can control the number of layers, but this requires us to refactor the encoder and convtemplate classes ( this will make the code more readable btw).

alncat avatar Nov 20 '23 12:11 alncat

Ok, yes, I’ll delete this line and try it with 64x64 and remove the terms about the layers etc

jamesmkrieger avatar Nov 20 '23 13:11 jamesmkrieger

Thanks again for all your help

jamesmkrieger avatar Nov 20 '23 13:11 jamesmkrieger

I decided to upsample the particles to 130 now and use downfrac 1.0 because there is an error that it has to be bigger than 128 and that's the first bigger even number. This is still for testing quickly at the moment and it is giving results now, which I think look ok considering how small the data set is.

I was actually trying 192 before and it was working. Now that I've deleted the line and those arguments, I'm getting another problem about not having a mask. Do we need to always have one?

jamesmkrieger avatar Nov 20 '23 16:11 jamesmkrieger

It is recommended to have a mask since the program can then determine the region with densities, and then crop out empty regions, which can save some memories. If no mask is supplied, the program will use a spherical mask with diameter 0.85 x image size. Since the mask often comes with the consensus refinement result, I usually do training with a mask. I will do some tests without mask to make sure that option works.

Mask is handled in decoder here, https://github.com/alncat/opusDSD/blob/86bed17a235c3a166ca03b51aa75963a3f81c63e/cryodrgn/models.py#L888

It is handled in encoder here, https://github.com/alncat/opusDSD/blob/86bed17a235c3a166ca03b51aa75963a3f81c63e/cryodrgn/models.py#L509

alncat avatar Nov 21 '23 00:11 alncat

I decided to upsample the particles to 130 now and use downfrac 1.0 because there is an error that it has to be bigger than 128 and that's the first bigger even number. This is still for testing quickly at the moment and it is giving results now, which I think look ok considering how small the data set is.

I was actually trying 192 before and it was working. Now that I've deleted the line and those arguments, I'm getting another problem about not having a mask. Do we need to always have one?

Ah, the current code doesn't work without a mask. It needs some revises to make it work.

alncat avatar Nov 21 '23 02:11 alncat

Ok. Thanks for checking.

I guess you mean the dynamic mask from cryosparc? Relion and Xmipp do not automatically make any

jamesmkrieger avatar Nov 21 '23 06:11 jamesmkrieger

Ok. Thanks for checking.

I guess you mean the dynamic mask from cryosparc? Relion and Xmipp do not automatically make any

Yes, cryosparc generates models together with masks at every iteration. James, you can check my latest commit. I made the default spherical mask work. The diameter of mask can be controlled by --window-r.

alncat avatar Nov 21 '23 09:11 alncat

Ok, I’ll probably be able to try it tomorrow or Thursday. Thanks

jamesmkrieger avatar Nov 21 '23 10:11 jamesmkrieger

Hello,

Sorry for the delay. I've been quite busy and I got moved to a different workstation.

I've just given it another try and this part seems to be solved, but now there's another error:

(opusdsd-0.3.2b) flex@pascal ~/ScipionUserData/projects/TestOpusDsd $ eval "$(/home/flex/anaconda3/bin/conda shell.bash hook)"&& conda activate opusdsd-0.3.2b && CUDA_VISIBLE_DEVICES=0 python -m cryodrgn.commands.train_cv Runs/000160_OpusDsdProtTrain/extra/input_particles.star --poses Runs/000160_OpusDsdProtTrain/output/poses.pkl --ctf Runs/000160_OpusDsdProtTrain/output/ctfs.pkl --zdim 12 -o Runs/000160_OpusDsdProtTrain/output  -n 3 --lazy-single --pe-type vanilla --encode-mode grad --template-type conv -b 20 --lr 0.00012 --beta-control 1.0 --beta cos --downfrac 1.0 --valfrac 0.2 --lamb 1.0 --bfactor 4.0 --templateres 192 --split Runs/000160_OpusDsdProtTrain/extra/sp-split.pkl --relion31
2023-11-29 15:57:29     /home/flex/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py Runs/000160_OpusDsdProtTrain/extra/input_particles.star --poses Runs/000160_OpusDsdProtTrain/output/poses.pkl --ctf Runs/000160_OpusDsdProtTrain/output/ctfs.pkl --zdim 12 -o Runs/000160_OpusDsdProtTrain/output -n 3 --lazy-single --pe-type vanilla --encode-mode grad --template-type conv -b 20 --lr 0.00012 --beta-control 1.0 --beta cos --downfrac 1.0 --valfrac 0.2 --lamb 1.0 --bfactor 4.0 --templateres 192 --split Runs/000160_OpusDsdProtTrain/extra/sp-split.pkl --relion31
2023-11-29 15:57:29     Namespace(particles='/data/flex/ScipionUserData/projects/TestOpusDsd/Runs/000160_OpusDsdProtTrain/extra/input_particles.star', outdir='/data/flex/ScipionUserData/projects/TestOpusDsd/Runs/000160_OpusDsdProtTrain/output', ref_vol=None, zdim=12, poses='/data/flex/ScipionUserData/projects/TestOpusDsd/Runs/000160_OpusDsdProtTrain/output/poses.pkl', ctf='/data/flex/ScipionUserData/projects/TestOpusDsd/Runs/000160_OpusDsdProtTrain/output/ctfs.pkl', group=None, group_stat=None, load=None, latents=None, split='Runs/000160_OpusDsdProtTrain/extra/sp-split.pkl', valfrac=0.2, checkpoint=1, log_interval=1000, verbose=False, seed=23942, ind=None, invert_data=True, window=True, window_r=0.85, datadir=None, relion31=True, lazy_single=True, notinmem=False, lazy=False, preprocessed=False, max_threads=16, tilt=None, tilt_deg=45, num_epochs=3, batch_size=20, wd=0, lr=0.00012, lamb=1.0, downfrac=1.0, templateres=192, bfactor=4.0, beta='cos', beta_control=1.0, norm=None, tmp_prefix='tmp', amp=False, multigpu=False, num_gpus=4, do_pose_sgd=False, pretrain=1, emb_type='quat', pose_lr=0.0003, pose_enc=False, pose_only=False, plot=False, qlayers=3, qdim=256, encode_mode='grad', enc_mask=None, use_real=False, optimize_b=False, players=3, pdim=256, pe_type='vanilla', template_type='conv', warp_type=None, symm=None, num_struct=1, deform_size=2, pe_dim=None, domain='fourier', activation='relu')
2023-11-29 15:57:29     Use cuda True
2023-11-29 15:57:29     Loading dataset from /data/flex/ScipionUserData/projects/TestOpusDsd/Runs/000160_OpusDsdProtTrain/extra/input_particles.star
2023-11-29 15:57:29     Loaded 1799 64x64 images
2023-11-29 15:57:29     first image: [[-0.09194159  0.39910644 -0.02999168 ...  0.65424025  2.1068065
   0.6698552 ]
 [-0.2128554   0.75497377  0.22428153 ...  1.1024915   2.5193024
   0.5522364 ]
 [ 0.53614616  0.6558835   0.09200134 ... -0.04922847  1.5195653
   0.58241683]
 ...
 [-0.12323537  0.11786314 -0.9315448  ... -0.92004204 -1.1743912
  -0.9257372 ]
 [ 0.04064267  0.4853493  -0.06305265 ...  0.2970134  -0.2873671
  -1.0044745 ]
 [-0.48678073 -0.38818717 -0.8513743  ...  1.0835191   0.2279402
  -1.3637027 ]]
2023-11-29 15:57:29     Image Mean, Std are 0.0024950394872576 +/- 0.8993831276893616
2023-11-29 15:57:29     Reading all images into memory!
2023-11-29 15:57:29     loaded eulers
euler difference:  tensor(4.5443e-05) 1799
max difference:  torch.return_types.max(
values=tensor([5.9485e-05, 1.7452e-04, 6.1035e-05]),
indices=tensor([ 932, 1219,  705]))
[[ 107.801792  105.980954 -131.130655]
 [  78.074471  117.122663   78.76982 ]
 [-141.683945   45.648309 -103.320529]
 [-138.71656    45.619975  146.100663]
 [-169.744927   39.55955    83.692854]]
tensor([[ 72.1982, 105.9810,  23.3289],
        [101.9255, 117.1227, 203.1557],
        [-38.3161,  45.6483, 245.0045],
        [-41.2834,  45.6199,  -7.3841],
        [-10.2551,  39.5596,  86.0520]])
nn:  1799 batch_size:  20
[0, 0, 20, 0, 0, 40, 0, 0, 20, 80, 60, 40, 20, 100, 20, 80, 240, 0, 0, 0, 60, 0, 20, 0, 20, 0, 0, 40, 0, 0, 0, 60, 0, 20, 100, 20, 20, 120, 20, 20, 0, 0, 40, 0, 20, 80, 0, 0]
1799 1799 [2, 5, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 22, 24, 27, 31, 33, 34, 35, 36, 37, 38, 39, 42, 44, 45]
2023-11-29 15:57:29     Loading ctf params from /data/flex/ScipionUserData/projects/TestOpusDsd/Runs/000160_OpusDsdProtTrain/output/ctfs.pkl
2023-11-29 15:57:29     Image size (pix)  : 64
2023-11-29 15:57:29     A/pix             : 5.53125
2023-11-29 15:57:29     DefocusU (A)      : 35136.05859375
2023-11-29 15:57:29     DefocusV (A)      : 33578.890625
2023-11-29 15:57:29     Dfang (deg)       : 100.2699966430664
2023-11-29 15:57:29     voltage (kV)      : 300.0
2023-11-29 15:57:29     cs (mm)           : 2.700000047683716
2023-11-29 15:57:29     w                 : 0.10000000149011612
2023-11-29 15:57:29     Phase shift (deg) : 0.0
2023-11-29 15:57:29     first ctf params is: [5.531250e+00 3.513606e+04 3.357889e+04 1.002700e+02 3.000000e+02
 2.700000e+00 1.000000e-01 0.000000e+00]
initializing 2d grid of size  64
/home/flex/anaconda3/envs/opusdsd-0.3.2b/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2023-11-29 15:57:31     creating ctf grid False with grid tensor([[ 0,  1,  2,  ..., 30, 31, 32],
        [ 1,  1,  2,  ..., 30, 31, 32],
        [ 2,  2,  3,  ..., 30, 31, 32],
        ...,
        [ 3,  3,  4,  ..., 30, 31, 32],
        [ 2,  2,  3,  ..., 30, 31, 32],
        [ 1,  1,  2,  ..., 30, 31, 32]], device='cuda:0')
tensor(0, device='cuda:0')
2023-11-29 15:57:31     created ctf grid with shape: torch.Size([64, 33, 2]), max_r: 45
2023-11-29 15:57:31     Using circular lattice with radius 32
2023-11-29 15:57:31     model: image supplemented into encoder will be of size 64
2023-11-29 15:57:31     encoder: the input image size is 54
2023-11-29 15:57:31     convtemplate: the output volume is of size 192, resample intermediate activations of size 16 to 12
2023-11-29 15:57:31     decoder: downsampling apix from 5.53125 to 5.53125
torch.Size([1, 54, 54])
2023-11-29 15:57:31     HetOnlyVAE(
  (encoder): Encoder(
    (transformer_e): SpatialTransformer()
    (down1): Sequential(
      (0): Conv3d(1, 32, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
      (1): LeakyReLU(negative_slope=0.2)
      (2): Conv3d(32, 64, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
      (3): LeakyReLU(negative_slope=0.2)
      (4): Conv3d(64, 128, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
      (5): LeakyReLU(negative_slope=0.2)
    )
    (down2): Sequential(
      (0): Conv3d(128, 256, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
      (1): LeakyReLU(negative_slope=0.2)
      (2): Conv3d(256, 512, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
      (3): LeakyReLU(negative_slope=0.2)
      (4): Conv3d(512, 512, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
      (5): LeakyReLU(negative_slope=0.2)
    )
    (down3): Sequential(
      (0): Linear(in_features=512, out_features=512, bias=True)
      (1): LeakyReLU(negative_slope=0.2)
    )
    (mu): Linear(in_features=512, out_features=12, bias=True)
    (logstd): Linear(in_features=512, out_features=12, bias=True)
  )
  (decoder): VanillaDecoder(
    (template): ConvTemplate(
      (template1): Sequential(
        (0): Linear(in_features=12, out_features=512, bias=True)
        (1): LeakyReLU(negative_slope=0.2)
        (2): Linear(in_features=512, out_features=2048, bias=True)
        (3): LeakyReLU(negative_slope=0.2)
      )
      (template2): Sequential(
        (0): ConvTranspose3d(2048, 1024, kernel_size=(2, 2, 2), stride=(2, 2, 2))
        (1): LeakyReLU(negative_slope=0.2)
        (2): ConvTranspose3d(1024, 512, kernel_size=(2, 2, 2), stride=(2, 2, 2))
        (3): LeakyReLU(negative_slope=0.2)
      )
      (template3): Sequential(
        (0): ConvTranspose3d(512, 256, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
        (1): LeakyReLU(negative_slope=0.2)
        (2): ConvTranspose3d(256, 128, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
        (3): LeakyReLU(negative_slope=0.2)
      )
      (template4): Sequential(
        (0): ConvTranspose3d(128, 64, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
        (1): LeakyReLU(negative_slope=0.2)
        (2): ConvTranspose3d(64, 32, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
        (3): LeakyReLU(negative_slope=0.2)
        (4): ConvTranspose3d(32, 16, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
        (5): LeakyReLU(negative_slope=0.2)
      )
      (conv_out): ConvTranspose3d(16, 1, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
    )
    (transformer): SpatialTransformer()
  )
)
template_type:  conv
2023-11-29 15:57:31     61402601 parameters in model
2023-11-29 15:57:31     28196856 parameters in encoder
2023-11-29 15:57:31     33205745 parameters in decoder
2023-11-29 15:57:32     loading train validation split from Runs/000160_OpusDsdProtTrain/extra/sp-split.pkl
num_samples:  940
num_samples:  120
2023-11-29 15:57:32     image will be downsampled to 1.0 of original size 64
2023-11-29 15:57:32     reconstruction will be blurred by bfactor 4.0
2023-11-29 15:57:32     learning rate [0.00012], bfactor: 4.333333333333333, beta_max: 1.0, beta_control: 1.0 for epoch 0
ns:  [0, 0, 20, 0, 0, 20, 0, 0, 20, 40, 40, 20, 20, 80, 20, 60, 180, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 20, 0, 0, 0, 40, 0, 20, 80, 0, 20, 100, 0, 20, 0, 0, 20, 0, 0, 60, 0, 0]
current_ind:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Traceback (most recent call last):
  File "/home/flex/anaconda3/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/flex/anaconda3/envs/opusdsd-0.3.2b/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/flex/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 1144, in <module>
    main(args)
  File "/home/flex/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/commands/train_cv.py", line 991, in main
    rot, tran = posetracker.get_pose(ind)
  File "/home/flex/scipion3/software/em/opusdsd-0.3.2b/cryodrgn/pose.py", line 312, in get_pose
    rot = self.rots[ind]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

I also had to uninstall and reinstall pytorch because it was limited to cuda 10 and didn't support the RTX 3090 GPUs I have on the new machine. I'll create another issue about that.

Without the supported GPU, I didn't get this error, which makes sense with using cpu

jamesmkrieger avatar Nov 29 '23 15:11 jamesmkrieger

Thank you very much for reporting this! I will look into it.

alncat avatar Nov 30 '23 05:11 alncat

You’re welcome

jamesmkrieger avatar Nov 30 '23 08:11 jamesmkrieger

James, I reproduced this bug! I fixed it in this commit https://github.com/alncat/opusDSD/commit/b722d2b97aac9a6cf82cfb773f7407214873cd36 . The training script runs correctly now.

alncat avatar Dec 01 '23 09:12 alncat

but there are might still some bugs without extensive testing.

alncat avatar Dec 01 '23 13:12 alncat

Thanks!

I'll continue testing it and let you know what I come across

jamesmkrieger avatar Dec 01 '23 14:12 jamesmkrieger

James, I found that opus-dsd only works with pytorch 1.11.0 or below. I tested it using pytorch 1.12.0 and find some bizarre behaviours. I created an environment file contains cuda 11.3 and pytorch 1.10.1 in the recent commit https://github.com/alncat/opusDSD/commit/07234404fad42370c801696563261c31a9dfa754 . Opus-dsd works well in this environment.

alncat avatar Dec 04 '23 07:12 alncat

Great. Thanks very much

jamesmkrieger avatar Dec 04 '23 10:12 jamesmkrieger