kent
kent copied to clipboard
`Response is missing required header Content-Length: for url `
Describe the bug
Unable to import bigwig files stored on a remote server.
Originally posted in the rtracklayer
GH repo and was directed by rtracklayer
developer @sanchit-saini to repost here.
To Reproduce
viewpoint <- 166169213
gr.span <- GenomicRanges::GRanges(
seqnames = "chr6",
ranges = IRanges::IRanges(
start = viewpoint - 1000000,
end = viewpoint + 1000000
)
)
gr <- rtracklayer::import("https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig")
Expected behavior
bigWigs (either the entire file, or a queried subset) can be imported from either local or remote sources.
Session info
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 rjson_0.2.21 deldir_1.0-6
[4] ellipsis_0.3.2 rprojroot_2.0.3 biovizBase_1.44.0
[7] htmlTable_2.4.1 XVector_0.36.0 GenomicRanges_1.48.0
[10] base64enc_0.1-3 dichromat_2.0-0.1 rstudioapi_0.13
[13] remotes_2.4.2 bit64_4.0.5 AnnotationDbi_1.58.0
[16] fansi_1.0.3 xml2_1.3.3 codetools_0.2-18
[19] splines_4.2.0 ggbio_1.44.1 cachem_1.0.6
[22] knitr_1.39 Formula_1.2-4 Rsamtools_2.12.0
[25] cluster_2.1.3 dbplyr_2.2.1 png_0.1-7
[28] graph_1.74.0 BiocManager_1.30.18 compiler_4.2.0
[31] httr_1.4.3 backports_1.4.1 assertthat_0.2.1
[34] Matrix_1.4-1 fastmap_1.1.0 lazyeval_0.2.2
[37] cli_3.3.0 htmltools_0.5.3 prettyunits_1.1.1
[40] tools_4.2.0 gtable_0.3.0 glue_1.6.2
[43] GenomeInfoDbData_1.2.8 reshape2_1.4.4 dplyr_1.0.9
[46] rappdirs_0.3.3 Rcpp_1.0.9 Biobase_2.56.0
[49] vctrs_0.4.1 Biostrings_2.64.0 rtracklayer_1.57.0
[52] xfun_0.31 stringr_1.4.0 lifecycle_1.0.1
[55] restfulr_0.0.15 ensembldb_2.20.2 XML_3.99-0.10
[58] zlibbioc_1.42.0 scales_1.2.0 BSgenome_1.64.0
[61] VariantAnnotation_1.42.1 hms_1.1.1 MatrixGenerics_1.8.1
[64] ProtGenerics_1.28.0 RBGL_1.72.0 parallel_4.2.0
[67] SummarizedExperiment_1.26.1 AnnotationFilter_1.20.0 RColorBrewer_1.1-3
[70] yaml_2.3.5 curl_4.3.2 memoise_2.0.1
[73] gridExtra_2.3 ggplot2_3.3.6 biomaRt_2.52.0
[76] rpart_4.1.16 reshape_0.8.9 latticeExtra_0.6-30
[79] stringi_1.7.8 RSQLite_2.2.15 S4Vectors_0.34.0
[82] BiocIO_1.6.0 checkmate_2.1.0 GenomicFeatures_1.48.3
[85] BiocGenerics_0.42.0 filelock_1.0.2 BiocParallel_1.30.3
[88] GenomeInfoDb_1.32.2 rlang_1.0.4 pkgconfig_2.0.3
[91] matrixStats_0.62.0 bitops_1.0-7 lattice_0.20-45
[94] purrr_0.3.4 GenomicAlignments_1.32.1 htmlwidgets_1.5.4
[97] bit_4.0.4 tidyselect_1.1.2 here_1.0.1
[100] GGally_2.1.2 plyr_1.8.7 magrittr_2.0.3
[103] R6_2.5.1 IRanges_2.30.0 generics_0.1.3
[106] Hmisc_4.7-0 DelayedArray_0.22.0 DBI_1.1.3
[109] pillar_1.8.0 foreign_0.8-82 survival_3.3-1
[112] KEGGREST_1.36.3 RCurl_1.98-1.8 nnet_7.3-17
[115] tibble_3.1.8 crayon_1.5.1 interp_1.1-3
[118] utf8_1.2.2 OrganismDbi_1.38.1 BiocFileCache_2.4.0
[121] jpeg_0.1-9 progress_1.2.2 grid_4.2.0
[124] data.table_1.14.2 blob_1.2.3 digest_0.6.29
[127] stats4_4.2.0 munsell_0.5.0
Many thanks in advance, Brian
This is the relevant error message :
Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(ranges) :
Response is missing required header Content-Length: for url https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig
It seems hashFindValUpperCase
couldn't find hashed key for Content-Length
and failing because of it
https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/udc.c#L598
https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/udc.c#L624-L628
So I went further to investigate whether the value of Content-Length
was fetch and hashed:
https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/udc.c#L550
https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/net.c#L1488-L1496
https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/net.c#L1435-L1475
I used printf
to log values before call hashAdd
and It seems there's some issue with it.
As it didn't show Content-Length
but I'm not sure which component is the responsible for the issue (parsing[netUrlHeadExt
] or fetching[lineFileAttach
]).
I'm happy to collaborate but for that I need someone help who's more familiar with the nitty-gritty details of the codebase.
Not quite sure who the primary maintainer is currently, but tagging the developers with the most overall contributions for assistance: @NullModel @galt @katerose @JimKent
Good Evening Brian: I'm trying to follow what it is you are describing here. It looks like you are running an R script that is using rtracklayer code. I am not aware of how the rtracklayer code uses the kent source. I'm assuming it is somehow built into rtracklayer. I guess my question would be, which version of the kent source is being used ? When I try the ordinary kent code command line tools on the URL you display here, it all appears to work just fine. The error you describe appears to be in the network connection business. From that same system you are working from, do the ordinary kent command line tools work with the URL ? For example, bigWigInfo should return:
bigWigInfo "https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig" version: 4 isCompressed: yes isSwapped: 0 primaryDataSize: 38,394,145 primaryIndexSize: 226,392 zoomLevels: 10 chromCount: 198 basesCovered: 520,652,859 mean: 1.871263 min: 0.050000 max: 574.520020 std: 7.786024
Thanks @NullModel for the pointers and for clarifying that this issue is not related to kent. At some point, rtracklayer somewhat diverge from the kent source. I will look into the networking module in rtracklayer.
Hi @sanchit-saini, wow, this is interesting. We had no idea that you're calling our C functions from R directly, here, for the bigWig data import: https://github.com/lawremi/rtracklayer/blob/master/R/bigWig.R#L240 (the only thing that I can't figure out is which function you're calling exactly, it's in the string/macro "bwgFile_query" but I can't find the value.
If we changed a function header and that broke something, let us know. We can give you a stable function call, e.g. bwQuery_stable, and can promise to never change the header of that. But we haven't ever touched the headers, as far as I can see from the git history, so maybe this has to do with something related to rtracklayer?
The only change was
-struct bbiFile *bigWigFileOpenAlias(char *fileName, struct hash aliasHash); -/ Open up big wig file. Free this up with bbiFileClose. Use aliasHash if non-NULL */ +struct bbiFile *bigWigFileOpenAlias(char fileName, aliasFunc aliasFunc); +/ Open up big wig file. Free this up with bbiFileClose. Use aliasFunc if non-NULL */
in commit b622d147b7dbac52dbf3ba26928cd18e02d42bd8 but this is not a function you use (as far as I can tell)
@braneyboo I think we've had this issue before, that whenever we change something in bigBed/bigWig, rtracklayer runs into trouble. I wonder if we can think about a system to reduce this in the future... not sure what. Maybe make separate, stable function calls that we never touch and that are marked as such in the code?
Hi @maximilianh ,
the only thing that I can't figure out is which function you're calling exactly, it's in the string/macro "bwgFile_query" but I can't find the value.
Calling to this function which uses kent library functions https://github.com/lawremi/rtracklayer/blob/05017acd2b4abc934b7e8c9873bdedcc099875aa/src/bigWig.c#L226
Kent source that rtracklayer relies on resides at : https://github.com/lawremi/rtracklayer/tree/master/src/ucsc
If we changed a function header and that broke something, let us know. We can give you a stable function call, e.g. bwQuery_stable, and can promise to never change the header of that. But we haven't ever touched the headers, as far as I can see from the git history, so maybe this has to do with something related to rtracklayer?
I don't think it is related to the modification of function headers. I'm still not sure what's causing this issue and need some time to investigate it.
Apart from that, I like the idea of having stable functions. It would be great. We could do it at some time. First, we've to factor out the list of used functions.
Thanks! Yes, this confirms that this is your call: struct bbiFile * file = bigWigFileOpen((char *)CHAR(asChar(r_filename)));
And we didn't change bigWigFileOpen. We made a single change 4-5 years ago to bigWigFileOpen and that broke another package. The engineer who made the most recent change confirmed that he will never touch these function signatures. I guess only you can find out what exactly broke here. We definitely made a small change, but not to any of the function headers that you're calling.
Good Morning Brian:
The file appears to be completely accessible from here:
curl -I 'https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig'
HTTP/1.1 200 OK Date: Wed, 10 Aug 2022 14:43:53 GMT Last-Modified: Fri, 24 Jun 2022 11:55:23 GMT ETag: "385a3e4-5e2303eccc61b" Accept-Ranges: bytes Content-Length: 59089892 Connection: close Server: Data Science Institute ICL Strict-Transport-Security: max-age=15552001; includeSubDomains; preload;
bigWigInfo 'https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig'
version: 4 isCompressed: yes isSwapped: 0 primaryDataSize: 38,394,145 primaryIndexSize: 226,392 zoomLevels: 10 chromCount: 198 basesCovered: 520,652,859 mean: 1.871263 min: 0.050000 max: 574.520020 std: 7.786024
time bigWigToBedGraph 'https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig' A549_H3K27ac_R1_tag.ucsc.bedGraph
real 0m4.773s
head A549_H3K27ac_R1_tag.ucsc.bedGraph
chr1 134884 135092 0.47 chr1 135092 135228 1.43 chr1 191097 191541 0.14 chr1 191541 191677 0.97 chr1 267953 268089 2.14 chr1 526646 526782 1.43 chr1 629822 629834 2.67
You would need to find out what error exactly rtracklayer is seeing when it tries to access the file. See if you can run the kent commands on the file from your access location.
--Hiram
@sanchit-saini have you had any luck with this?