RNAsamba
RNAsamba copied to clipboard
Training
Hello,
I am trying to train plant specific model. The coding file includes 7.5Gb and lncrna is 322Mb.
They were killed due to memory issue and I wonder that I could get any advice.
Thanks.
rnasamba train -v 2 plant_model.hdf5 all.fasta allncrna.fasta
Using TensorFlow backend. [1/3] Computing network inputs. /cm/local/apps/slurm/var/spool/job4501031/slurm_script: line 4: 30641 Killed
RNAsamba stores all the sequences in memory to train a model. So, if your computer doesn't have enough memory, there's nothing you can do to use all the sequences for training.
My suggestion is to cluster similar sequences together and use the representatives to train the model. You could use MMseqs2 or CD-HIT for that.
mmseqs easy-linclust all.fasta clustered tmp
This should help you to eliminate redundant sequences.