atac_dnase_pipelines
atac_dnase_pipelines copied to clipboard
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 30
Hi,
I ran the pipeline successfully before, but recently I got an error at atacqc step on all my runs. No qc summary (json/html) file was created in qc directory. I am wondering whether anyone has the same issue.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 30: ordinal not in range(128)
Fatal error: /mnt/isilon/sfgi/programs/atac_dnase_pipelines/atac.bds, line 1612, pos 2. Task/s failed.
Below is server info that may help you for debugging.
$ uname -a
Linux l-0-01 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ conda env list
# conda environments:
#
bds_atac /mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac
bds_atac_py3 /mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac_py3
root * /mnt/isilon/sfgi/programs/miniconda3
$ source activate bds_atac
(bds_atac)
$ which conda
/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/bin/conda
$ conda list
# packages in environment at /mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac:
#
argcomplete 1.0.0 py27_1
argh 0.26.2 py27_0 bioconda
bcftools 1.4 0 bioconda
bedtools 2.26.0 0 bioconda
bioconductor-biocgenerics 0.18.0 r3.2.2_0 bioconda
bioconductor-biocparallel 1.4.3 r3.2.2_0 bioconda
bioconductor-biostrings 2.38.4 0 bioconda
bioconductor-genomeinfodb 1.6.3 0 bioconda
bioconductor-genomicranges 1.22.4 0 bioconda
bioconductor-iranges 2.4.8 0 bioconda
bioconductor-rsamtools 1.22.0 r3.2.2_1 bioconda
bioconductor-s4vectors 0.8.11 0 bioconda
bioconductor-xvector 0.10.0 1 bioconda
bioconductor-zlibbioc 1.16.0 r3.2.2_1 bioconda
biopython 1.67 np110py27_0
boost 1.57.0 4
bowtie 1.1.2 py27_2 bioconda
bowtie2 2.2.6 py27_0 bioconda
bx-python 0.7.3 np110py27_1 bioconda
bzip2 1.0.6 3
cairo 1.14.8 0
curl 7.52.1 0
cutadapt 1.9.1 py27_0 bioconda
cycler 0.10.0 py27_0
cython 0.25.2 py27_0
expat 2.1.0 0
fastqc 0.11.5 1 bioconda
fisher 0.1.4 py27_0 bioconda
fontconfig 2.12.1 3
freetype 2.5.5 2
gffutils 0.8.7.1 py27_1 bioconda
ghostscript 9.16 0 asmeurer
glib 2.43.0 2 asmeurer
gnuplot 4.6.0 1 bioconda
graphviz 2.38.0 4 anaconda
gsl 1.16 1 asmeurer
harfbuzz 0.9.35 6 asmeurer
htslib 1.4 0 bioconda
icu 54.1 0
java-jdk 8.0.92 1 bioconda
jbig 2.1 0
jinja2 2.9.6 py27_0
jpeg 9b 0
libffi 3.0.13 3 asmeurer
libgcc 4.8.5 1 asmeurer
libgfortran 3.0.0 1
libiconv 1.14 0
libpng 1.6.27 0
libtiff 4.0.6 3
libtool 2.4.2 0 asmeurer
libxml2 2.9.4 0
macs2 2.1.0 0 bioconda
markupsafe 0.23 py27_2
matplotlib 1.5.1 np110py27_0
metaseq 0.5.6 py27_0 bioconda
mkl 11.3.3 0
mysql 5.5.24 0
ncurses 5.9 5 asmeurer
nose 1.3.7 py27_1
numpy 1.10.2 py27_0
openblas 0.2.14 4
openssl 1.0.2k 1
pandas 0.18.0 np110py27_0
pango 1.36.8 3 asmeurer
pcre 8.39 1
perl-threaded 5.22.0 10 bioconda
picard 1.126 4 bioconda
pigz 2.3 0
pip 9.0.1 py27_1
pixman 0.34.0 0
preseq 2.0.2 gsl1.16_0 bioconda
pybedtools 0.6.9 py27_0 bcbio
pycairo 1.10.0 py27_0
pyfaidx 0.4.7.1 py27_0 bioconda
pyparsing 2.1.4 py27_0
pyqt 4.10.4 py27_0 asmeurer
pysam 0.8.2.1 py27_0 bcbio
python 2.7.13 0
python-dateutil 2.2 py27_0 asmeurer
python-levenshtein 0.12.0 py27_1 bioconda
pytz 2017.2 py27_0
pyyaml 3.12 py27_0
qt 4.8.5 0 asmeurer
r 3.2.2 0 asmeurer
r-base 3.2.2 0 asmeurer
r-bitops 1.0_6 r3.2.2_1 asmeurer
r-boot 1.3_17 r3.2.2_0 asmeurer
r-catools 1.17.1 r3.2.2_2 asmeurer
r-class 7.3_14 r3.2.2_0 asmeurer
r-cluster 2.0.3 r3.2.2_0 asmeurer
r-codetools 0.2_14 r3.2.2_0 asmeurer
r-foreign 0.8_66 r3.2.2_0 asmeurer
r-futile.logger 1.4.1 r3.2.2_0 bioconda
r-futile.options 1.0.0 r3.2.2_0 bioconda
r-kernsmooth 2.23_15 r3.2.2_0 asmeurer
r-lambda.r 1.1.7 r3.2.2_0 bioconda
r-lattice 0.20_33 r3.2.2_0 asmeurer
r-mass 7.3_44 r3.2.2_0 asmeurer
r-matrix 1.2_2 r3.2.2_0 asmeurer
r-mgcv 1.8_7 r3.2.2_0 asmeurer
r-nlme 3.1_122 r3.2.2_0 asmeurer
r-nnet 7.3_11 r3.2.2_0 asmeurer
r-recommended 3.2.2 r3.2.2_0 asmeurer
r-rpart 4.1_10 r3.2.2_0 asmeurer
r-snow 0.4_1 r3.2.2_0 bioconda
r-snowfall 1.84_6.1 r3.2.2_0 bioconda
r-spatial 7.3_11 r3.2.2_0 asmeurer
r-spp 1.13 r3.2.2_0 bioconda
r-survival 2.38_3 r3.2.2_0 asmeurer
readline 6.2 2
sambamba 0.6.5 0 bioconda
samtools 1.2 2 bioconda
scikit-learn 0.17.1 np110py27_2
scipy 0.17.0 np110py27_4
setuptools 27.2.0 py27_0
simplejson 3.10.0 py27_0
sip 4.15.5 py27_0 asmeurer
six 1.10.0 py27_0
sqlite 3.13.0 0
system 5.8 2
tk 8.5.18 0
trim-galore 0.4.1 0 bioconda
ucsc-bedclip 332 0 bioconda
ucsc-bedgraphtobigwig 323 0 daler
ucsc-bedtobigbed 323 0 daler
ucsc-bigwigaverageoverbed 332 0 bioconda
ucsc-bigwiginfo 332 0 bioconda
ucsc-fetchchromsizes 323 0 daler
ucsc-twobittofa 332 0 bioconda
ucsc-wigtobigwig 323 0 daler
wheel 0.29.0 py27_0
xz 5.2.2 1
yaml 0.1.6 0
zlib 1.2.8
PS: our cluster server has experienced several updates since my last successful run. I do not know what caused the problem here.
Thank you, Chun
Hi, could you pass in the tail end (or full) BDS log file? (Should end in *.log). Looks like a unicode error when the ATAQC module is writing out the html output, but would help to know exactly which function in the module has the problem. It's likely due to the server update, but good to fix this for anyone else who might run into this on other similar server environments.
Hi. Thank you for quick reply.
Here are the last 20 lines of bds.log.
$ tail -n 20 bds.log
File "/mnt/isilon/sfgi/programs/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in <module>
main()
File "/mnt/isilon/sfgi/programs/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1454, in main
raw_peak_summ, raw_peak_dist = get_region_size_metrics(PEAKS)
File "/mnt/isilon/sfgi/programs/atac_dnase_pipelines/ataqc/run_ataqc.py", line 752, in get_region_size_metrics
ax.set_title('Peak width distribution for {0}'.format(filename))
File "/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 172, in set_title
title.set_text(label)
File "/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/lib/python2.7/site-packages/matplotlib/text.py", line 1206, in set_text
self._text = '%s' % (s,)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 30: ordinal not in range(128)
Fatal error: /mnt/isilon/sfgi/programs/atac_dnase_pipelines/atac.bds, line 1612, pos 2. Task/s failed.
atac.bds, line 82 : main()
atac.bds, line 85 : void main() { // atac pipeline starts here
atac.bds, line 109 : ataqc()
atac.bds, line 1601 : void ataqc() {
atac.bds, line 1612 : wait
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Thanks for the log - that was helpful! It looks like there's some issue with setting up the filename, which I'm extrapolating to think that there may be some nonstandard character in your prefix set up (causing the ASCII/Unicode issue). This is something that our code should handle (rather than the user), and I will input a fix for it, but it will take a few days for me to get to it properly - in the meantime, you are welcome to try changing your input prefix and seeing if that resolves your problem, otherwise stay tuned on this issue for the fix to come through. thanks for bringing our attention to it!
Thank you! I think you are right about nonstandard character, since the name for the library is Naïve instead of naive....
But now, I have a new issue, which happened to all the libraries I am working on. The error is ValueError: all the input array dimensions except for the concatenation axis must match exactly
It looks like "_nx.concatenate()" in some python files caused problems. I could not figure out why I did not see this error in my successfully run before. Do you have any way to fix this problem too.
Here are more detailed standard error from run_atacseq.py
--------------------Stderr--------------------
Picked up _JAVA_OPTIONS: -Xms256M -Xmx45G -XX:ParallelGCThreads=1 -Djava.io.tmpdir=/mnt/isilon/sfgi/suc1/tmp
ERROR 2018-04-24 10:24:41 ProcessExecutor Warning messages:
ERROR 2018-04-24 10:24:41 ProcessExecutor 1: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2018-04-24 10:24:41 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2018-04-24 10:24:41 ProcessExecutor 2: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2018-04-24 10:24:41 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2018-04-24 10:24:41 ProcessExecutor 3: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2018-04-24 10:24:41 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2018-04-24 10:24:41 ProcessExecutor 4: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2018-04-24 10:24:41 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2018-04-24 10:24:41 ProcessExecutor 5: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2018-04-24 10:24:41 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2018-04-24 10:24:41 ProcessExecutor 6: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2018-04-24 10:24:41 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
Picked up _JAVA_OPTIONS: -Xms256M -Xmx45G -XX:ParallelGCThreads=1 -Djava.io.tmpdir=/mnt/isilon/sfgi/suc1/tmp
[bam_sort_core] merging from 20 files...
[bam_sort_core] merging from 17 files...
Picked up _JAVA_OPTIONS: -Xms256M -Xmx45G -XX:ParallelGCThreads=1 -Djava.io.tmpdir=/mnt/isilon/sfgi/suc1/tmp
Picked up _JAVA_OPTIONS: -Xms256M -Xmx45G -XX:ParallelGCThreads=1 -Djava.io.tmpdir=/mnt/isilon/sfgi/suc1/tmp
processing chromosomes
Traceback (most recent call last):
File "/mnt/isilon/sfgi/programs/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in
main()
File "/mnt/isilon/sfgi/programs/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1460, in main
ROADMAP_META, OUTPUT_PREFIX)
File "/mnt/isilon/sfgi/programs/atac_dnase_pipelines/ataqc/run_ataqc.py", line 905, in compare_to_roadmap
sample_mean0_col)
File "/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/lib/python2.7/site-packages/scipy/stats/stats.py", line 3310, in spearmanr
rs = np.corrcoef(ar, br, rowvar=axisout)
File "/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2145, in corrcoef
c = cov(x, y, rowvar)
File "/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2024, in cov
X = np.vstack((X, y))
File "/mnt/isilon/sfgi/programs/miniconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
interesting - it could be related to a different output format from a different version of ucsc tools - can you provide the top of the *signal file in the qc folder if you have it?