neusomatic icon indicating copy to clipboard operation
neusomatic copied to clipboard

Multi-patient training

Open chrissype opened this issue 4 years ago • 2 comments

I would like to run a large scale training task over thousands of labelled patient BAMs. Is this currently supported with neusomatic in any way, or will I have to write some custom code to recombine the generated training data?

chrissype avatar Apr 07 '20 16:04 chrissype

@chrissype happy to see your interest in NeuSomatic. Yes, you can train on multiple samples as follows:

  1. For each sample run preprocess.py. This will you give you per sample candidate TSV files in the following paths: sample_i_output/dataset/work.*/candidates*.tsv
  2. Use all the candidate TSV files from multiple samples together to buil a NeuSomatic model using train.py. So, as --candidates_tsv argument you can provide paths to all candidate TSVs, like:

--candidates_tsv sample_*_output/dataset/work.*/candidates*.tsv OR

--candidates_tsv sample_1_output/dataset/work.*/candidates*.tsv \
sample_2_output/dataset/work.*/candidates*.tsv ... \
sample_n_output/dataset/work.*/candidates*.tsv ... \

msahraeian avatar Apr 07 '20 18:04 msahraeian

That's amazing, many thanks!

chrissype avatar Apr 07 '20 18:04 chrissype