RNAsamba icon indicating copy to clipboard operation
RNAsamba copied to clipboard

Training

Open wyim-pgl opened this issue 1 year ago • 1 comments

Hello, I am trying to train plant specific model. The coding file includes 7.5Gb and lncrna is 322Mb. They were killed due to memory issue and I wonder that I could get any advice. Thanks. rnasamba train -v 2 plant_model.hdf5 all.fasta allncrna.fasta Using TensorFlow backend. [1/3] Computing network inputs. /cm/local/apps/slurm/var/spool/job4501031/slurm_script: line 4: 30641 Killed

wyim-pgl avatar Mar 31 '23 00:03 wyim-pgl

RNAsamba stores all the sequences in memory to train a model. So, if your computer doesn't have enough memory, there's nothing you can do to use all the sequences for training.

My suggestion is to cluster similar sequences together and use the representatives to train the model. You could use MMseqs2 or CD-HIT for that.

mmseqs easy-linclust all.fasta clustered tmp

This should help you to eliminate redundant sequences.

apcamargo avatar Mar 31 '23 02:03 apcamargo