kineticsTools icon indicating copy to clipboard operation
kineticsTools copied to clipboard

ipdSummary - KeyError: 'DS'

Open jgonzalez10 opened this issue 6 years ago • 14 comments

Hi!

I've been trying to run ipdSummary on PacBio datasets but it doesn't work. It keeps generating the same error: KeyError: 'DS'

This is a part of the error I'm receiving:

3

I've tried to run it via pbsmrtpipe, but still getting errors.

¿What can I do? I really need to analyze this data via command line.

Thanks! Juliana

jgonzalez10 avatar Jun 08 '18 17:06 jgonzalez10

The last line of the traceback is looking for an index. Does your input bam file have a .pbi index associated with it?

jrharting avatar Jun 08 '18 17:06 jrharting

Hi! Yes it does. I have the .pbi index file in the same directory as the input .bam file.

jgonzalez10 avatar Jun 08 '18 18:06 jgonzalez10

Apologies for the delay -- does your reference also have an .fai index associuated with it?
Can you try to run pbvalidate on your aligned bam?

pbvalidate --aligned --index --reference [your ref] [your aligned bam]

jrharting avatar Jun 14 '18 18:06 jrharting

Hi, "does your reference also have an .fai index associuated with it?" Yes, it does. "Can you try to run pbvalidate on your aligned bam?" I did, I'm getting the following:

screen shot 2018-06-15 at 1 24 15 pm

What does that mean? thanks!

jgonzalez10 avatar Jun 15 '18 17:06 jgonzalez10

It looks like the bam you are using does not match PacBio specifications. What mapping program did you use to generate the alignments?

You should be able to use the tool pbbamify along with the original raw subreadset to populate the tags in your bam if you generated it using something other than pbalign/blasr (e.g. ngmlr).

jrharting avatar Jun 15 '18 17:06 jrharting

We used blasr. However I think pbbamify is a good idea. I do not see it as part of the SMRT tools I have installed. Should it be there? Or do I have to download something else?

jgonzalez10 avatar Jun 15 '18 18:06 jgonzalez10

it's available via bioconda

mjs-macbook-pro:~ mhsieh$ miniconda3/bin/conda create -n pacbiotest pbbam
Solving environment: done
(skipped to save electrons)
Proceed ([y]/n)? y


Downloading and Extracting Packages
(skipped to save electrons)
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > source activate pacbiotest
#
# To deactivate an active environment, use:
# > source deactivate
#

mjs-macbook-pro:~ mhsieh$ (export PATH=miniconda3/bin:$PATH; source activate pacbiotest; pbbamify --help)
Usage: pbbamify [options] <ref.fa> <pb.bam>|<pb.fofn>|<pb.xml>

pbbamify converts an arbitray aligned BAM file to a PacBio-compatible BAM
file.Input BAM file is read from a file or stdin, the raw-reads PacBio BAM is
givenas a parameter, and BAM output is written to stdout.

Options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

  Options:
                        Reference used to align the input.
    --input=STR         The aligned non-PacBio BAM file. If not provided, stdin
                        will be used as input.
    --output=STR        Path to the output BAM file. If not specified, output
                        will be to the stdout.
    --verbose-level=INT
                        Specifies the level of info which will be output
                        produced onstderr. 0 turns all output off, 1 outputs
                        only warnings, while levels 2 and above outputs a status
                        message every 1000000 (2), 100000 (3), 1000 (4), 100
                        (5), 10 (6) and 1 (7) reads.
                        A PacBio BAM file containing raw reads.

mhsieh avatar Jun 15 '18 19:06 mhsieh

Ok so I ran pbbamify on my .bam file, it seems to solve the DS error but now I'm getting this new error, any idea what this can be? It's also not much reported online.

screen shot 2018-06-19 at 12 06 08 pm

jgonzalez10 avatar Jun 19 '18 16:06 jgonzalez10

Looks like you might not have the sequencing chemistry set properly in your bam header.

If you call samtools view -H [your bam] on your input bam, what are the values for the following variables in the header? BINDINGKIT SEQUENCINGKIT BASECALLERVERSION

Those values should all match one of the mappings defined here: https://github.com/PacificBiosciences/pbcore/blob/master/pbcore/chemistry/resources/mapping.xml

If not, you could go back to your original subreads.bam to find those values, and then re-populate your downstream bam with the correct values using samtools reheader and sed

jrharting avatar Jun 20 '18 16:06 jrharting

Yes, they match this one:

Sequel® Sequencing Plate Silwet Mapping SequencingChemistry>S/P2-C2/5.0</SequencingChemistry BindingKit>100-862-200</BindingKit SequencingKit>101-309-500</SequencingKit SoftwareVersion>5.0</SoftwareVersion /Mapping

jgonzalez10 avatar Jun 21 '18 19:06 jgonzalez10

related - https://github.com/PacificBiosciences/pbcore/commit/0650c4dd1d26ae8e51cc8da482d5ebba018186a7

mhsieh avatar Jul 09 '18 19:07 mhsieh

Sorry, I don't understand what you mean. Could you please elaborate it more?

jgonzalez10 avatar Jul 19 '18 17:07 jgonzalez10

That commit was the change to provide more information for the original KeyError. It seems that you have already passed the original KeyError in the title, so I might not be of too much help here.

My apology that I wasn't be clear about it.

mhsieh avatar Jul 19 '18 17:07 mhsieh

Thank you anyway! :)

jgonzalez10 avatar Jul 25 '18 22:07 jgonzalez10