slamdunk icon indicating copy to clipboard operation
slamdunk copied to clipboard

error in alleyoop summary

Open beckedorff opened this issue 2 years ago • 12 comments

Hi Tobias,

I'm having problems with alleyoop summary.

I'm using slam dunk thought docker and I have run the following commands.

slamdunk all alleyoop summary

This is the command: alleyoop summary -t /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/
-o /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/summary/out_file_summary
/data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/filter/*_filtered.bam

The error is getting is:

Running alleyoop summary for 6 files Traceback (most recent call last): File "/opt/conda/envs/slamdunk/bin/alleyoop", line 8, in sys.exit(run()) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/alleyoop.py", line 500, in run runSummary(args.bam, args.outputFile, args.countDirectory) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/alleyoop.py", line 105, in runSummary stats.readSummary(bam, countDirectory, outputFile, getLogFile(outputLog)) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/stats.py", line 578, in readSummary callR(getPlotter("PCAPlotter") + " -f " + f.name + " -O " + replaceExtension(outputFile, ".pdf", "_PCA") + " -P " + replaceExtension(outputFile, ".txt", "_PCA"), log, dry=printOnly, verbose=verbose) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 211, in callR raise RuntimeError("Error while executing command: "" + cmd + """) RuntimeError: Error while executing command: "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/plot/PCAPlotter.R -f /tmp/tmpxclbfgl2 -O /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/summary/out_file_summary_PCA.pdf -P /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/summary/out_file_summary_PCA.txt"

log file contains:

/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/plot/PCAPlotter.R -f /tmp/tmpxclbfgl2 -O /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/summary/out_file_summary_PCA.pdf -P /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/summary/out_file_summary_PCA.txt b'Error in pca$x[, 2] : subscript out of bounds\n'b'Calls: data.frame\n'b'Execution halted\n'

Thanks for your help,

Felipe

beckedorff avatar Mar 17 '22 16:03 beckedorff

Hi Felipe,

do you have a file produced in /tmp/tmpxclbfgl2? And if yes, how does it look like?

t-neumann avatar Mar 18 '22 12:03 t-neumann

Yes, I do have the file /tmp/tmpxclbfgl2

This is what the file contains. It's the tsv files path.

sample_0 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/0d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/0d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/3d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/3d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/Wash_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/Wash_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv

beckedorff avatar Mar 18 '22 14:03 beckedorff

Hm so a very stupid thing to suggest is to start R from inside the container and go throught he PCAPlotter.R script and see what's the issue - it doesnt really look like some library loading issue but rather something being malformed in the files itself. Is that something you could check?

t-neumann avatar Mar 21 '22 11:03 t-neumann

I went through the PCAPlotter.R and I don't understand why the "countsList" output is 14 columns should be 6 because I have 6 tsv files.

countsList = list()
for (i in 1:nrow(samples)) {
  curTab = read.delim(samples$file[i],stringsAsFactors=FALSE, comment.char="#")
  countsList[[samples$sample[i]]] = curTab$TcReadCount
}

Samples looks correct.

samples sample 1 sample_0 2 sample_0 3 sample_0 4 sample_0 5 sample_0 6 sample_0 file 1 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/0d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv 2 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/0d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv 3 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/3d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv 4 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/3d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv 5 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/Wash_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv 6 /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/count/Wash_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv

However, when I print "countsList" I have 14 columns.

If I keep going in the code "variances = apply(countMatrix, 1, var)" variances result in a matrix full of NA, which is going to affect downstream code.

I didn't find the error.

beckedorff avatar Mar 22 '22 20:03 beckedorff

Is there any chance you could zip me the _tcount.tsv files as well as the /tmp/tmpxclbfgl2 file so I can check myself?

t-neumann avatar Mar 23 '22 15:03 t-neumann

Is there any chance you could zip me the _tcount.tsv files as well as the /tmp/tmpxclbfgl2 file so I can check myself?

Yes I can. How I send the data only to you ?

beckedorff avatar Mar 23 '22 21:03 beckedorff

You could email it - [email protected]

t-neumann avatar Mar 24 '22 08:03 t-neumann

Ok now I know what's going on - sorry for taking so long.

sample_0	/groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/0d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv
sample_0	/groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/0d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv
sample_0	/groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/3d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv
sample_0	/groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/3d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv
sample_0	/groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/Wash_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv
sample_0	/groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/Wash_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv

The first column has always the same name. Did you name your samples differently or always "sample_0"?

How do you run slamdunk?

Since it's only 1 sample in there, a PCA naturally makes no sense and will crash

t-neumann avatar Mar 29 '22 11:03 t-neumann

slamdunk all -t 40 -o /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36 -r /data/references/hg19/genes/gencode/male.hg19.fa -b /data/references/hg19/genes/gencode/gencode_v19_3utr_comprehensive_sorted_merged.bed /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/fastq_files/Wash_1rep_R1.fastq.gz

All other alleyoop function works fine, only PCA does not work.

How can I fix this issue?

beckedorff avatar Mar 29 '22 20:03 beckedorff

What if you input all fastq files into the slamdunk all command via wildcard (*)?

t-neumann avatar Mar 29 '22 20:03 t-neumann

slamdunk all -t 40 -o /data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36 -r /data/references/hg19/genes/gencode/male.hg19.fa -b /data/references/hg19/genes/gencode/gencode_v19_3utr_comprehensive_sorted_merged.bed

/data/users/felipe/data/rnaseq/slam_seq/lucas_data/h3k36/fastq_files/Wash_1rep_R1.fastq.gz

All other alleyoop function works fine, only PCA does not work.

How can I fix this issue?

Em ter., 29 de mar. de 2022 às 07:23, Tobias Neumann < @.***> escreveu:

Ok now I know what's going on - sorry for taking so long.

sample_0 /groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/0d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/0d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/3d_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/3d_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/Wash_1rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv sample_0 /groups/zuber/zubarchive/USERS/tobias/tmp/slamdunkdebug/felipe_tsv_files/Wash_2rep_R1.fastq_slamdunk_mapped_filtered_tcount.tsv

The first column has always the same name. Did you name your samples differently or always "sample_0"?

How do you run slamdunk?

Since it's only 1 sample in there, a PCA naturally makes no sense and will crash

— Reply to this email directly, view it on GitHub https://github.com/t-neumann/slamdunk/issues/112#issuecomment-1081748132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAPROZJYNP2XJ6JU6OZYGDVCLR2TANCNFSM5Q7IB6PQ . You are receiving this because you authored the thread.Message ID: @.***>

beckedorff avatar Oct 11 '22 09:10 beckedorff

Well you run it only on a single sample from what I see correct? Then a PCA does not really make sense

t-neumann avatar Oct 11 '22 09:10 t-neumann