dropEst icon indicating copy to clipboard operation
dropEst copied to clipboard

FillCollisionsAdjustmentInfo is taking too long to complete

Open k3yavi opened this issue 7 years ago • 14 comments

Hi @VPetukhov , I am trying to use DropEst with one of the 10x datasets from here. I downloaded the BAM from the above link and used dropest -f mode to generate the count matrix and the rds file for the downstream UMI correction. Unfortunately I am noob in R and just copy pasting code from this tutorial. I have got a couple of question regarding the working of dropEst pipeline:

  1. When we provide an external BAM (like from 10x website in my case) what type of UMI correction does dropest command do before generating the count matrix? I am guessing it simply count the tags from the BAM if I don't give either -m or -M?
  2. When I try to use dropestr R package following the above tutorial it seems to give good umi distribution till here. But once I try to use the function
collisions_info <- FillCollisionsAdjustmentInfo(umi_probabilities, max_umi_per_gene, step=20, mc.cores=5, verbose=T)

the program give following error

> FillCollisionsAdjustmentInfo(umi_probabilities, max_umi_per_gene, step=20, mc.cores=5, verbose=T)
Error in FillCollisionsAdjustmentInfo(umi_probabilities, max_umi_per_gene,  :
  unused arguments (step = 20, mc.cores = 5, verbose = T)

So I removed step = 20, mc.cores = 5, verbose = T but it has been 3 days now and the program won't finish.

What am I doing wrong? Thanks again for your help.

my config.xml

<config>
    <!-- droptag -->
    <TagsSearch>
        <protocol>10x</protocol>
        <BarcodesSearch>
            <barcode1_length>8</barcode1_length>
            <barcode2_length>16</barcode2_length>
            <umi_length>10</umi_length>
            <r1_rc_length>0</r1_rc_length>
        </BarcodesSearch>

        <Processing>
            <min_align_length>10</min_align_length>
            <reads_per_out_file>10000000</reads_per_out_file>
            <poly_a_tail>AAAAAAAA</poly_a_tail>
        </Processing>
    </TagsSearch>

    <!-- dropest -->
    <Estimation>
        <Merge>
	    <barcodes_file>/mnt/scratch5/avi/dropest_data/barcodes.tsv</barcodes_file>
            <barcodes_type>const</barcodes_type>
            <min_merge_fraction>0.2</min_merge_fraction>
            <max_cb_merge_edit_distance>2</max_cb_merge_edit_distance>
            <max_umi_merge_edit_distance>1</max_umi_merge_edit_distance>
            <min_genes_after_merge>100</min_genes_after_merge>
            <min_genes_before_merge>20</min_genes_before_merge>
        </Merge>

        <PreciseMerge>
            <max_merge_prob>1e-5</max_merge_prob>
            <max_real_merge_prob>1e-7</max_real_merge_prob>
        </PreciseMerge>
    </Estimation>

    <BamTags> <!-- Optional. Tags, which are used to parse .bam file (-f option) or to print tagged .bam file (-b or -F options). Default values correspond to 10x protocol. -->
        <cb>CB</cb> <!-- Cell barcode. Default: CB. -->
        <cb_raw>CR</cb_raw> <!-- Cell barcode raw. Used only for bam output. Default: CR. -->
        <umi>UB</umi> <!-- UMI. Default: UB. -->
        <umi_raw>UR</umi_raw> <!-- UMI raw. Used only for bam output. Default: UR. -->
        <gene>GX</gene> <!-- Gene id. Default: GX. -->
        <cb_quality>CQ</cb_quality> <!-- Cell barcode quality. Default: CQ. -->
        <umi_quality>UQ</umi_quality> <!-- UMI quality. Default: UQ. -->
    </BamTags>
</config>

k3yavi avatar Oct 22 '18 02:10 k3yavi

Hi @k3yavi , Ohm, this function should work really fast. Can you please give me values of max_umi_per_gene and length(umi_probabilities)?

VPetukhov avatar Nov 07 '18 12:11 VPetukhov

Hi @VPetukhov , The experiment has the following numbers:

> max_umi_per_gene
[1] 4192617
> length(umi_probabilities)
[1] 1048576

k3yavi avatar Nov 15 '18 23:11 k3yavi

4 millions UMIs per gene? It's definitely not a correct number. Can you please publish your dropEst log?

VPetukhov avatar Nov 16 '18 11:11 VPetukhov

Unfortunately I can't find the log of the run, but I just used the bam from the 10x website and the config as attached above. Attaching the results.html file.

report.html.zip

k3yavi avatar Nov 16 '18 20:11 k3yavi

Hi I was wondering is there any luck resolving this issue ?

k3yavi avatar Jan 09 '19 20:01 k3yavi

There were several problems with UMI correction, but now it should work. Please, check the result I published for neurons_900. I fixed your config, so the pipeline reads non-corrected UMI and their quality.

VPetukhov avatar Feb 02 '19 00:02 VPetukhov

Thanks @VPetukhov for looking into it, I'll try running the tool again and report back.

k3yavi avatar Feb 02 '19 00:02 k3yavi

@VPetukhov , With the latest version of dropEst, I get the following error.

/mnt/scratch5/avi/alevin/bin/dropest/dropEst/build/dropest -f -c est_config.dump.xml -C 1200 ./possorted_genome_bam.bam
Version: 0.8.5.
Run: 02/02/2019 16:27:07.
No such node (max_cb_merge_edit_distance)

I am using the same xml file as you shared for the neurons_900 dataset in https://github.com/hms-dbmi/dropEst/issues/68 and got the bam from the 10x website.

The weirdest thing is that, with every run of drpoEst, xml file gets cleared out. I double checked for the content of the xml file before giving that as an input to dropEst, but the program fails with the above error.

Thanks in advance for your help.

k3yavi avatar Feb 02 '19 21:02 k3yavi

Oki I have a workaround, if I make the xml read only then it seems to work fine, at least till the generation of the rds file. Now checking dropEstr for the UMI correction.

k3yavi avatar Feb 02 '19 22:02 k3yavi

I can use dropestr to generate UMI corrected counts now. Thanks for the help.

k3yavi avatar Feb 03 '19 20:02 k3yavi

The weirdest thing is that, with every run of drpoEst, xml file gets cleared out.

If you used xml from the same folder and the same name as in the archive, it's normal behavior. It's a dump of the config file, and the pipeline dumps it again. So you just need to move / rename it.

VPetukhov avatar Feb 03 '19 22:02 VPetukhov

ah! got it. Thanks, for the clarification .

k3yavi avatar Feb 03 '19 22:02 k3yavi

I think I closed this issue a little too early without properly going through the logs, sorry for that. But in my defense the output is getting generated but with the following warning, which I missed earlier

Warning message:
In parallel::mclapply(..., mc.cores = GetMcCores(mc.cores)) :
  all scheduled cores encountered errors in user code

I am not sure how does this warning effects the pipeline and the output count matrix. I tried to use only 1 core to avoid multithreading and the program fails with the following error:

Correcting UMI sequence errors.
Estimating UMIs distribution... Completed.
Filling collisions info...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Completed.
Filling info about adjacent UMIs...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Done!
Correction info prepared!


Estimating prior error probabilities... Completed.
Correcting UMI sequence errors...Error in `$<-.data.frame`(`*tmp*`, "IsMerged", value = logical(0)) :
  replacement has 0 rows, data has 4
Calls: CorrectUmiSequenceErrors ... lapply -> FUN -> PredictBayesian -> $<- -> $<-.data.frame
Execution halted

I tried using step by step procedure instead of the quick one too. In that case I get the following error:

> cm <- CorrectUmiSequenceErrors(reads_per_umi_per_cell, umi.probabilities=umi_probabilities, collisions.info=collision_info,
+                                correction.info=correction_info, mc.cores=1, verbosity.level=2)
Correcting UMI sequence errors.

Estimating prior error probabilities... Completed.
Correcting UMI sequence errors...Error in GetSmallerNeighboursDistributionsBySizes(dp.matrices, larger.nn,  :
  _Map_base::at
>

k3yavi avatar Feb 05 '19 02:02 k3yavi

Hey @k3yavi

If you could provide more information so this could be debugged, I would appreciate it.

Thanks, Evan

evanbiederstedt avatar Sep 10 '20 23:09 evanbiederstedt