SequelTools
SequelTools copied to clipboard
Issue with sample names
Hi,
I'm running SequelTools for 8 CLR samples. I'm giving the sample names with -u subfiles.txt
option. In the subfiles.txt file I put the address of the bam files. This is my command:
SequelTools.sh -t Q -u subFiles.txt -n 12 -p a -g a -o $OUT_DIR
I am getting weird plots for my stats with the same name for each bam file. A sample plot is attached. Also the summaryTable.txt
looks like this with the same number for all samples:
SMRTcell numReadsSubread numReadsLongestSub totalBasesSubread totalBasesLongestSub meanReadLenSubread meanReadLenLongestSub medianReadLenSubread medianReadLenLongestSub n50Subread n50LongestSub l50Subread l50LongestSub PSR ZOR
oasis 1320271 181528 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.137
oasis 2578421 377887 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.147
oasis 2252172 320325 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.142
oasis 2320629 335461 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2266229 324966 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.143
oasis 2165289 302979 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.140
oasis 4398328 638727 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2499748 348122 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.139
Would you let me know what's wrong? Thanks n50s.pdf
Hello,
Thank you for using SequelTools! Subfiles.txt should be a file-of-filenames, which it sounds like it is in your case. These filenames are what determines the name of each SMRTcell in the output. Are your files all named oasis.bam? If so, changing those names to unique identifiers should resolve the issue. Let me know if that works for you.
Best, Dr. David E. Hufnagel
On Tue, Oct 20, 2020 at 7:27 PM mldmort [email protected] wrote:
Hi,
I'm running SequelTools for 8 CLR samples. I'm giving the sample names with -u subfiles.txt option. In the subfiles.txt file I put the address of the bam files. This is my command: SequelTools.sh -t Q -u subFiles.txt -n 12 -p a -g a -o $OUT_DIR I am getting weird plots for my stats with the same name for each bam file. A sample plot is attached. Also the summaryTable.txt looks like this with the same number for all samples:
SMRTcell numReadsSubread numReadsLongestSub totalBasesSubread totalBasesLongestSub meanReadLenSubread meanReadLenLongestSub medianReadLenSubread medianReadLenLongestSub n50Subread n50LongestSub l50Subread l50LongestSub PSR ZOR oasis 1320271 181528 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.137 oasis 2578421 377887 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.147 oasis 2252172 320325 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.142 oasis 2320629 335461 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145 oasis 2266229 324966 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.143 oasis 2165289 302979 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.140 oasis 4398328 638727 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145 oasis 2499748 348122 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.139
Would you let me know what's wrong? Thanks n50s.pdf https://github.com/ISUgenomics/SequelTools/files/5412390/n50s.pdf
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ISUgenomics/SequelTools/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQPE3LRDVARAAX6TUBYSYLSLYTGRANCNFSM4SZBSWCA .
Hi,
my Subfiles.txt contain:
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam
I thought that the names come from the bam files but it doesn't seems to. The name oasis
appears in the output directory in the -o
option:
-o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults
I don't know why oasis is chosen for the name of all the files and why the stats of the last file is chosen for all the cases.
So I checked and it turns out that the stats in summaryTable.txt
for all samples correspond to the last file.
Any idea why it happens? Thank,
Hey Arun,
I hope you can see the whole conversation here. I'm a little perplexed by this problem. Do you have some ideas as to what's causing these issues?
Let me know, Best, David
On Wed, Oct 21, 2020 at 12:28 PM mldmort [email protected] wrote:
Hi,
my Subfiles.txt contain:
/projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam /projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam
I thought that the names come from the bam files but it doesn't seems to. The name oasis appears in the output directory in the -o option: -o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults
I don't know why oasis is chosen for the name of all the files and why the stats of the last file is chosen for all the cases. So I checked and it turns out that the stats in summaryTable.txt for all samples correspond to the last file.
Any idea why it happens? Thank,
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ISUgenomics/SequelTools/issues/7#issuecomment-713734921, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQPE3OXPVQ7N2LG3524SD3SL4K5TANCNFSM4SZBSWCA .
@mldmort from first glance, it looks like the --
in the file name is causing something unintended, can you please try it one more time renaming the bam files without double dash?
Did this resolve the issue mldmort?
On Wed, Oct 21, 2020 at 2:12 PM Arun Seetharam [email protected] wrote:
@mldmort https://github.com/mldmort from first glance, it looks like the -- in the file name is causing something unintended, can you please try it one more time renaming the bam files without double dash?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ISUgenomics/SequelTools/issues/7#issuecomment-713816023, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQPE3PNCQDWGBR6AEQ47ZLSL4W75ANCNFSM4SZBSWCA .
Hi David,
No, I have used symbolic links to point to my bam files to see if it solves the problem. So my new subfiles.txt file looks like:
ACI.bam
BN.bam
BUF.bam
F344.bam
MR.bam
MS20.bam
WKY.bam
WN.bam
And the files link to the original bam files like:
ACI.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam
BN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam
BUF.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam
F344.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam
MR.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam
MS20.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam
WKY.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam
WN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam
I don't know if linking would be sufficient or not but maybe the next step is to change the original file name?
but the name oasis
which appears in the plots most probably come from the -o
option:
-o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults
That's the only place the name oasis
appears.
Also the summaryTable.txt
is still flawed with the same numbers for each row:
SMRTcell numReadsSubread numReadsLongestSub totalBasesSubread totalBasesLongestSub meanReadLenSubread meanReadLenLongestSub medianReadLenSubread medianReadLenLongestSub n50Subread n50LongestSub l50Subread l50LongestSub PSR ZOR
oasis 1320271 181528 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.137
oasis 2320629 335461 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2252172 320325 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.142
oasis 2165289 302979 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.140
oasis 2578421 377887 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.147
oasis 2266229 324966 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.143
oasis 4398328 638727 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145
oasis 2499748 348122 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.139
Any suggestions? Thanks,
Yes, I believe you will have to change the original names. I am doing additional testing for a demonstration of SequelTools I will be doing next week and unfortunately I'm finding that the required format for the names of the input files is quite rigid. It has to be something like this, "ID.scraps.bam" or "ID.subreads.bam", where ID is usually something like this, " m54138_180610_050652". That has been the structure of all the files I've seen come directly from PacBio sequencing machines. This software was published just this month and we are getting lots of feedback now on issues we did not come across before. You can expect updates coming in the next few weeks to make SequelTools more flexible and to resolve identified bugs and issues.
Best, David
On Thu, Oct 22, 2020 at 11:32 AM mldmort [email protected] wrote:
Hi David,
No, I have used symbolic links to point to my bam files to see if it solves the problem. So my new subfiles.txt file looks like:
ACI.bam BN.bam BUF.bam F344.bam MR.bam MS20.bam WKY.bam WN.bam
And the files link to the original bam files like:
ACI.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1001--bc1001.bam BN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1008--bc1008.bam BUF.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1003--bc1003.bam F344.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1010--bc1010.bam MR.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1002--bc1002.bam MS20.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1009--bc1009.bam WKY.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1011--bc1011.bam WN.bam -> /projects/long_reads/HS_founders/pacbio/demux/lima.bc1012--bc1012.bam
I don't know if linking would be sufficient or not but maybe the next step is to change the original file name? but the name oasis which appears in the plots most probably come from the -o option:
-o /oasis/scratch/comet/temp_project/RAT_DATA/HS_FOUNDERS/Pacbio_multiplex_all/QC/SequelToolsResults
That's the only place the name oasis appears. Also the summaryTable.txt is still flawed with the same numbers for each row:
SMRTcell numReadsSubread numReadsLongestSub totalBasesSubread totalBasesLongestSub meanReadLenSubread meanReadLenLongestSub medianReadLenSubread medianReadLenLongestSub n50Subread n50LongestSub l50Subread l50LongestSub PSR ZOR oasis 1320271 181528 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.137 oasis 2320629 335461 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145 oasis 2252172 320325 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.142 oasis 2165289 302979 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.140 oasis 2578421 377887 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.147 oasis 2266229 324966 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.143 oasis 4398328 638727 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.145 oasis 2499748 348122 21082583975 3794848484 8434 10901 8317 9856 9304 11125 885174 122214 0.180 0.139
Any suggestions? Thanks,
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ISUgenomics/SequelTools/issues/7#issuecomment-714613307, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQPE3MKL7GTM3ROYLACJCTSMBNCNANCNFSM4SZBSWCA .