gemBS
gemBS copied to clipboard
GemBS call - Multiple Errors & Error handlings
e.g. Below is the output of a few displayed errors when running gemBS call:
- print what subcommand (or what sample) failed.
- I'm running gemBS call with 40 Jobs (each with 2 threads), so multiple samples are running at the same time so have no idea which sample a error in the stdout refers to (i can guess the chr from the err log).
Here are a few suggestions wrt to error reporting etc. that I think would be useful to implement.
-
Add the word
ERROR:before the error line in the bs_call*.err file - which will be make it easier to rungrep -rn ...and see where errors are happening... -
Rename
*.errfiles to*.log(which is more correct). -
Hide log files and the contig files (
contig*.bedfiles) - i.e. maybe move into a tmp dir in the same dir - e.g.${bcf_dir}/tmp.- further to this, would be nicer if the JSONs are merged when the BCFs are merged)
-
Add a sentence - like "successfully completed processing chr*" - the current " Processing chromosome chr16 (OK)" is cryptic - especially using 'Processing" with "OK"
-
It seems that (based on timestamps and the fact that parts of the (dbsnp/ref) loading log lines are replaced with error lines), that 2+ threads ( lines and bscall/ other sub-thread stderr) are attempting to write to the console/ log file at the same time.
- how important is this - is it worth adding a mutex?
- Looking at the chr20 err log, there is no error output from bscall/other subthread - is this the reason. (the *err log file looks fine despite the failed exit code according to the stdout)
: Methylation Calling...
2018-08-22 16:25:57,198 ERROR: Process '/home/ucbtmog/.local/lib/python3.6/site-packages/gemBS/gemBSbinaries/bs_call' finished with -9
2018-08-22 16:25:57,199 ERROR: [E::bcf_write] Broken VCF record, the number of columns at chr1:1672448 does not match the number of samples (0 vs 1)
2018-08-22 16:25:57,199 ERROR: rence sequences
2018-08-22 16:25:57,199 ERROR: Completed loading dbSNP (no. contigs 25, no. bins 45003271, no. SNPs 609438362
2018-08-22 16:25:57,199 ERROR: Processing chromosome chr1 (OK)
Exception in thread Thread-1:
Traceback (most recent call last):
ValueError: Error while executing the bscall process.
...
2018-08-22 18:15:44,584 ERROR: Process '/home/ucbtmog/.local/lib/python3.6/site-packages/gemBS/gemBSbinaries/bs_call' finished with -9
2018-08-22 18:15:44,632 ERROR: Loading reference sequences
2018-08-22 18:15:44,632 ERROR: Loading dbSNP from /home/ucbtmog/a/analysis/ref_indexes/dbsnp.index
2018-08-22 18:15:44,632 ERROR: Completed loading reference sequences
2018-08-22 18:15:44,633 ERROR: Completed loading dbSNP (no. contigs 25, no. bins 45003271, no. SNPs 609438362
2018-08-22 18:15:44,633 ERROR: Processing chromosome chr20 (OK)
Exception in thread Thread-20:
Traceback (most recent call last):
ValueError: Error while executing the bscall process.
have updated above msg with other suggestions/issues wrt error reporting
Further to above, the gemBS call has finished and I have the following errors, Any idea what is causing them?
I've rerun the samples (i.e. remove associated files and a db-sync), but I still see the following issues.
- In:
bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr1.err- chr1:1672448bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr17.err- chr17:37
[E::bcf_write] Broken VCF record, the number of columns at * does not match the number of samples (0 vs 1)
- In
bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_@pool_4.err- chr14_GL:1
[W::vcf_parse] Contig 'chr14_GL' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::bcf_write] Broken VCF record, the number of columns at * does not match the number of samples (0 vs 1)
- In:
bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr4.err- chr4:49628383bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr10.err- chr10:41713562bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr16.err- chr16:34790245bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr21.err- chr21:8049828bs_call_EGAZ00001016575_90/bs_call_EGAZ00001016575_90_chr1.err- chr1:125184571- Note this is a diff sample
[E::vcf_parse_format] FORMAT column with no sample columns starting at *
- In
bs_call_EGAZ00001016574_90/bs_call_EGAZ00001016574_90_chr20.err
BsCall Exit Code -9 (displayed in stdout/stderr) with no error code etc in err log
The above errors disappeared when repeatedly deleting the relevant files and rerunning.
Moreover, the two samples that failed were of a much higher coverage (i.e. greater than 90X) than the rest (EGAZ00001016574, EGAZ00001016575) - as such it's possible that GemBS doesn't expect that large of an input?
Could there be a problem with disk space that’s leading to the failures?
On 27 Aug 2018, at 12:42, Ismail Moghul [email protected] wrote:
The above errors disappeared when repeatedly deleting the relevant files and rerunning.
Moreover, the two samples that fails whereof much higher coverage (i.e. greater than 90X) than the rest (EGAZ00001016574, EGAZ00001016575) - as such it's possible that GemBS doesn't expect that large of an input?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416187170, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPdx-pI2l7u95ZpaMzBmmLixiZbb-tks5uU80ogaJpZM4WIDS1.
That was the first thing I checked - there was a few TB of free space.
Does the contig ‘chr14-GL’ exist in the reference?
On 27 Aug 2018, at 12:47, Ismail Moghul [email protected] wrote:
That was the first thing I checked - there was a few TB of free space.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416188165, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd505gFiP1NctpF_bNiH-BHwfAtKeks5uU84-gaJpZM4WIDS1.
No, but it doesn't exist in the query BAM file either:
$ samtools view -H EGAZ00001016574_90.bam | grep chr14_GL
@SQ SN:chr14_GL000009v2_random LN:201709
@SQ SN:chr14_GL000225v1_random LN:211173
@SQ SN:chr14_GL000194v1_random LN:191469
And the chr14_GL0000*_random contigs do exist in the reference given to the caller?
On 27 Aug 2018, at 12:53, Ismail Moghul [email protected] wrote:
No, but it doesn't exist in the query BAM file either:
$ samtools view -H EGAZ00001016574_90.bam | grep chr14_GL @SQ SN:chr14_GL000009v2_random LN:201709 @SQ SN:chr14_GL000225v1_random LN:211173 @SQ SN:chr14_GL000194v1_random LN:191469 — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416189530, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPdyWWvvgaxD-rVO3xFAoba7Zr1QL4ks5uU8_GgaJpZM4WIDS1.
Yes they do
$ zcat hg38.fa.gz| grep chr14_GL
>chr14_GL000009v2_random
>chr14_GL000225v1_random
>chr14_GL000194v1_random
It’s difficult to see where to go from here. If you could create a example dataset that consistently failed (or at least failed with a reasonable frequency) that you could share then we could try and track down the problem.
Simon
On 27 Aug 2018, at 13:00, Ismail Moghul [email protected] wrote:
Yes they do
$ zcat hg38.fa.gz| grep chr14_GL
chr14_GL000009v2_random chr14_GL000225v1_random chr14_GL000194v1_random — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416190905, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd5cDBtvvHSPvZQoMj6-IpM3maigeks5uU9FJgaJpZM4WIDS1.
I understand.
As mentioned above, I managed to 'fix' the problems by simply rerunning the analysis repeatedly. If I do see something similar again, I will raise an issue here.
Nonetheless, in the long run, it would be useful to implement a few of the points I mentioned in the first message above - particularly printing which sample an error originates from in STDOUT etc. before the error message....
I will work on making the error messages more informative to try and make it simpler to track down problems such as those you highlighted.
Simon
On 27 Aug 2018, at 14:16, Ismail Moghul [email protected] wrote:
I understand.
As mentioned above, I managed to 'fix' the problems by simply rerunning the analysis repeatedly. If I do see something similar again, I will raise an issue here.
Nonetheless, in the long run, it would be useful to implement a few of the points I mentioned in the first message above - particularly printing which sample an error originates from in STDOUT etc. before the error message....
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/32#issuecomment-416207476, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd0xNaaZUNCYu-ixpXdZQoXb6F01jks5uU-MogaJpZM4WIDS1.