gemBS icon indicating copy to clipboard operation
gemBS copied to clipboard

GemBS call - Multiple Errors & Error handlings

Open IsmailM opened this issue 7 years ago • 12 comments

e.g. Below is the output of a few displayed errors when running gemBS call:

  • print what subcommand (or what sample) failed.
    • I'm running gemBS call with 40 Jobs (each with 2 threads), so multiple samples are running at the same time so have no idea which sample a error in the stdout refers to (i can guess the chr from the err log).

Here are a few suggestions wrt to error reporting etc. that I think would be useful to implement.

  • Add the word ERROR: before the error line in the bs_call*.err file - which will be make it easier to run grep -rn ... and see where errors are happening...

  • Rename *.err files to *.log (which is more correct).

  • Hide log files and the contig files (contig*.bed files) - i.e. maybe move into a tmp dir in the same dir - e.g. ${bcf_dir}/tmp.

    • further to this, would be nicer if the JSONs are merged when the BCFs are merged)
  • Add a sentence - like "successfully completed processing chr*" - the current " Processing chromosome chr16 (OK)" is cryptic - especially using 'Processing" with "OK"

  • It seems that (based on timestamps and the fact that parts of the (dbsnp/ref) loading log lines are replaced with error lines), that 2+ threads ( lines and bscall/ other sub-thread stderr) are attempting to write to the console/ log file at the same time.

    • how important is this - is it worth adding a mutex?
    • Looking at the chr20 err log, there is no error output from bscall/other subthread - is this the reason. (the *err log file looks fine despite the failed exit code according to the stdout)
: Methylation Calling...
2018-08-22 16:25:57,198 ERROR: Process '/home/ucbtmog/.local/lib/python3.6/site-packages/gemBS/gemBSbinaries/bs_call' finished with -9
2018-08-22 16:25:57,199 ERROR: [E::bcf_write] Broken VCF record, the number of columns at chr1:1672448 does not match the number of samples (0 vs 1)
2018-08-22 16:25:57,199 ERROR: rence sequences
2018-08-22 16:25:57,199 ERROR: Completed loading dbSNP (no. contigs 25, no. bins 45003271, no. SNPs 609438362
2018-08-22 16:25:57,199 ERROR: Processing chromosome chr1 (OK)
Exception in thread Thread-1:
Traceback (most recent call last):
ValueError: Error while executing the bscall process.

...

2018-08-22 18:15:44,584 ERROR: Process '/home/ucbtmog/.local/lib/python3.6/site-packages/gemBS/gemBSbinaries/bs_call' finished with -9
2018-08-22 18:15:44,632 ERROR: Loading reference sequences
2018-08-22 18:15:44,632 ERROR: Loading dbSNP from /home/ucbtmog/a/analysis/ref_indexes/dbsnp.index
2018-08-22 18:15:44,632 ERROR: Completed loading reference sequences
2018-08-22 18:15:44,633 ERROR: Completed loading dbSNP (no. contigs 25, no. bins 45003271, no. SNPs 609438362
2018-08-22 18:15:44,633 ERROR: Processing chromosome chr20 (OK)
Exception in thread Thread-20:
Traceback (most recent call last):
ValueError: Error while executing the bscall process.

IsmailM avatar Aug 22 '18 17:08 IsmailM

have updated above msg with other suggestions/issues wrt error reporting

IsmailM avatar Aug 22 '18 20:08 IsmailM

Further to above, the gemBS call has finished and I have the following errors, Any idea what is causing them?

I've rerun the samples (i.e. remove associated files and a db-sync), but I still see the following issues.

  • In:
    • bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr1.err - chr1:1672448
    • bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr17.err - chr17:37
[E::bcf_write] Broken VCF record, the number of columns at * does not match the number of samples (0 vs 1)
  • In bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_@pool_4.err - chr14_GL:1
[W::vcf_parse] Contig 'chr14_GL' is not defined in the header. (Quick workaround: index the file with tabix.) 
[E::bcf_write] Broken VCF record, the number of columns at * does not match the number of samples (0 vs 1)

  • In:
    • bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr4.err - chr4:49628383
    • bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr10.err - chr10:41713562
    • bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr16.err - chr16:34790245
    • bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr21.err - chr21:8049828
    • bs_call_EGAZ00001016575_90/bs_call_EGAZ00001016575_90_chr1.err - chr1:125184571
      • Note this is a diff sample
[E::vcf_parse_format] FORMAT column with no sample columns starting at *

  • In bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr20.err
BsCall Exit Code -9 (displayed in stdout/stderr) with no error code etc in err log

IsmailM avatar Aug 23 '18 08:08 IsmailM

The above errors disappeared when repeatedly deleting the relevant files and rerunning.

Moreover, the two samples that failed were of a much higher coverage (i.e. greater than 90X) than the rest (EGAZ00001016574, EGAZ00001016575) - as such it's possible that GemBS doesn't expect that large of an input?

IsmailM avatar Aug 27 '18 10:08 IsmailM

Could there be a problem with disk space that’s leading to the failures?

On 27 Aug 2018, at 12:42, Ismail Moghul [email protected] wrote:

The above errors disappeared when repeatedly deleting the relevant files and rerunning.

Moreover, the two samples that fails whereof much higher coverage (i.e. greater than 90X) than the rest (EGAZ00001016574, EGAZ00001016575) - as such it's possible that GemBS doesn't expect that large of an input?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416187170, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPdx-pI2l7u95ZpaMzBmmLixiZbb-tks5uU80ogaJpZM4WIDS1.

heathsc avatar Aug 27 '18 10:08 heathsc

That was the first thing I checked - there was a few TB of free space.

IsmailM avatar Aug 27 '18 10:08 IsmailM

Does the contig ‘chr14-GL’ exist in the reference?

On 27 Aug 2018, at 12:47, Ismail Moghul [email protected] wrote:

That was the first thing I checked - there was a few TB of free space.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416188165, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd505gFiP1NctpF_bNiH-BHwfAtKeks5uU84-gaJpZM4WIDS1.

heathsc avatar Aug 27 '18 10:08 heathsc

No, but it doesn't exist in the query BAM file either:

$ samtools view -H EGAZ00001016574_90.bam | grep chr14_GL
@SQ	SN:chr14_GL000009v2_random	LN:201709
@SQ	SN:chr14_GL000225v1_random	LN:211173
@SQ	SN:chr14_GL000194v1_random	LN:191469

IsmailM avatar Aug 27 '18 10:08 IsmailM

And the chr14_GL0000*_random contigs do exist in the reference given to the caller?

On 27 Aug 2018, at 12:53, Ismail Moghul [email protected] wrote:

No, but it doesn't exist in the query BAM file either:

$ samtools view -H EGAZ00001016574_90.bam | grep chr14_GL @SQ SN:chr14_GL000009v2_random LN:201709 @SQ SN:chr14_GL000225v1_random LN:211173 @SQ SN:chr14_GL000194v1_random LN:191469 — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416189530, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPdyWWvvgaxD-rVO3xFAoba7Zr1QL4ks5uU8_GgaJpZM4WIDS1.

heathsc avatar Aug 27 '18 10:08 heathsc

Yes they do

$ zcat hg38.fa.gz| grep chr14_GL
>chr14_GL000009v2_random
>chr14_GL000225v1_random
>chr14_GL000194v1_random

IsmailM avatar Aug 27 '18 11:08 IsmailM

It’s difficult to see where to go from here. If you could create a example dataset that consistently failed (or at least failed with a reasonable frequency) that you could share then we could try and track down the problem.

Simon

On 27 Aug 2018, at 13:00, Ismail Moghul [email protected] wrote:

Yes they do

$ zcat hg38.fa.gz| grep chr14_GL

chr14_GL000009v2_random chr14_GL000225v1_random chr14_GL000194v1_random — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416190905, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd5cDBtvvHSPvZQoMj6-IpM3maigeks5uU9FJgaJpZM4WIDS1.

heathsc avatar Aug 27 '18 11:08 heathsc

I understand.

As mentioned above, I managed to 'fix' the problems by simply rerunning the analysis repeatedly. If I do see something similar again, I will raise an issue here.

Nonetheless, in the long run, it would be useful to implement a few of the points I mentioned in the first message above - particularly printing which sample an error originates from in STDOUT etc. before the error message....

IsmailM avatar Aug 27 '18 12:08 IsmailM

I will work on making the error messages more informative to try and make it simpler to track down problems such as those you highlighted.

Simon

On 27 Aug 2018, at 14:16, Ismail Moghul [email protected] wrote:

I understand.

As mentioned above, I managed to 'fix' the problems by simply rerunning the analysis repeatedly. If I do see something similar again, I will raise an issue here.

Nonetheless, in the long run, it would be useful to implement a few of the points I mentioned in the first message above - particularly printing which sample an error originates from in STDOUT etc. before the error message....

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416207476, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd0xNaaZUNCYu-ixpXdZQoXb6F01jks5uU-MogaJpZM4WIDS1.

heathsc avatar Aug 27 '18 12:08 heathsc