kent icon indicating copy to clipboard operation
kent copied to clipboard

`Response is missing required header Content-Length: for url `

Open bschilder opened this issue 1 year ago • 11 comments

Describe the bug

Unable to import bigwig files stored on a remote server. Originally posted in the rtracklayer GH repo and was directed by rtracklayer developer @sanchit-saini to repost here.

To Reproduce

viewpoint <- 166169213 
gr.span <- GenomicRanges::GRanges(
    seqnames = "chr6",
    ranges = IRanges::IRanges(
        start = viewpoint - 1000000,
        end = viewpoint + 1000000
    )
)
 
gr <- rtracklayer::import("https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig") 

Expected behavior

bigWigs (either the entire file, or a queried subset) can be imported from either local or remote sources.

Session info

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3            rjson_0.2.21                deldir_1.0-6               
  [4] ellipsis_0.3.2              rprojroot_2.0.3             biovizBase_1.44.0          
  [7] htmlTable_2.4.1             XVector_0.36.0              GenomicRanges_1.48.0       
 [10] base64enc_0.1-3             dichromat_2.0-0.1           rstudioapi_0.13            
 [13] remotes_2.4.2               bit64_4.0.5                 AnnotationDbi_1.58.0       
 [16] fansi_1.0.3                 xml2_1.3.3                  codetools_0.2-18           
 [19] splines_4.2.0               ggbio_1.44.1                cachem_1.0.6               
 [22] knitr_1.39                  Formula_1.2-4               Rsamtools_2.12.0           
 [25] cluster_2.1.3               dbplyr_2.2.1                png_0.1-7                  
 [28] graph_1.74.0                BiocManager_1.30.18         compiler_4.2.0             
 [31] httr_1.4.3                  backports_1.4.1             assertthat_0.2.1           
 [34] Matrix_1.4-1                fastmap_1.1.0               lazyeval_0.2.2             
 [37] cli_3.3.0                   htmltools_0.5.3             prettyunits_1.1.1          
 [40] tools_4.2.0                 gtable_0.3.0                glue_1.6.2                 
 [43] GenomeInfoDbData_1.2.8      reshape2_1.4.4              dplyr_1.0.9                
 [46] rappdirs_0.3.3              Rcpp_1.0.9                  Biobase_2.56.0             
 [49] vctrs_0.4.1                 Biostrings_2.64.0           rtracklayer_1.57.0         
 [52] xfun_0.31                   stringr_1.4.0               lifecycle_1.0.1            
 [55] restfulr_0.0.15             ensembldb_2.20.2            XML_3.99-0.10              
 [58] zlibbioc_1.42.0             scales_1.2.0                BSgenome_1.64.0            
 [61] VariantAnnotation_1.42.1    hms_1.1.1                   MatrixGenerics_1.8.1       
 [64] ProtGenerics_1.28.0         RBGL_1.72.0                 parallel_4.2.0             
 [67] SummarizedExperiment_1.26.1 AnnotationFilter_1.20.0     RColorBrewer_1.1-3         
 [70] yaml_2.3.5                  curl_4.3.2                  memoise_2.0.1              
 [73] gridExtra_2.3               ggplot2_3.3.6               biomaRt_2.52.0             
 [76] rpart_4.1.16                reshape_0.8.9               latticeExtra_0.6-30        
 [79] stringi_1.7.8               RSQLite_2.2.15              S4Vectors_0.34.0           
 [82] BiocIO_1.6.0                checkmate_2.1.0             GenomicFeatures_1.48.3     
 [85] BiocGenerics_0.42.0         filelock_1.0.2              BiocParallel_1.30.3        
 [88] GenomeInfoDb_1.32.2         rlang_1.0.4                 pkgconfig_2.0.3            
 [91] matrixStats_0.62.0          bitops_1.0-7                lattice_0.20-45            
 [94] purrr_0.3.4                 GenomicAlignments_1.32.1    htmlwidgets_1.5.4          
 [97] bit_4.0.4                   tidyselect_1.1.2            here_1.0.1                 
[100] GGally_2.1.2                plyr_1.8.7                  magrittr_2.0.3             
[103] R6_2.5.1                    IRanges_2.30.0              generics_0.1.3             
[106] Hmisc_4.7-0                 DelayedArray_0.22.0         DBI_1.1.3                  
[109] pillar_1.8.0                foreign_0.8-82              survival_3.3-1             
[112] KEGGREST_1.36.3             RCurl_1.98-1.8              nnet_7.3-17                
[115] tibble_3.1.8                crayon_1.5.1                interp_1.1-3               
[118] utf8_1.2.2                  OrganismDbi_1.38.1          BiocFileCache_2.4.0        
[121] jpeg_0.1-9                  progress_1.2.2              grid_4.2.0                 
[124] data.table_1.14.2           blob_1.2.3                  digest_0.6.29              
[127] stats4_4.2.0                munsell_0.5.0  

Many thanks in advance, Brian

bschilder avatar Aug 10 '22 11:08 bschilder

This is the relevant error message :

Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(ranges) :
  Response is missing required header Content-Length: for url https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig

It seems hashFindValUpperCase couldn't find hashed key for Content-Length and failing because of it https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/udc.c#L598

https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/udc.c#L624-L628

So I went further to investigate whether the value of Content-Length was fetch and hashed: https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/udc.c#L550

https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/net.c#L1488-L1496

https://github.com/ucscGenomeBrowser/kent/blob/c850f4d308195cb4f8a8e7c9c3c7f2eb5199f382/src/lib/net.c#L1435-L1475

I used printf to log values before call hashAdd and It seems there's some issue with it. As it didn't show Content-Length but I'm not sure which component is the responsible for the issue (parsing[netUrlHeadExt] or fetching[lineFileAttach]).

I'm happy to collaborate but for that I need someone help who's more familiar with the nitty-gritty details of the codebase.

sanchit-saini avatar Aug 24 '22 12:08 sanchit-saini

Not quite sure who the primary maintainer is currently, but tagging the developers with the most overall contributions for assistance: @NullModel @galt @katerose @JimKent

bschilder avatar Aug 25 '22 12:08 bschilder

Good Evening Brian: I'm trying to follow what it is you are describing here. It looks like you are running an R script that is using rtracklayer code. I am not aware of how the rtracklayer code uses the kent source. I'm assuming it is somehow built into rtracklayer. I guess my question would be, which version of the kent source is being used ? When I try the ordinary kent code command line tools on the URL you display here, it all appears to work just fine. The error you describe appears to be in the network connection business. From that same system you are working from, do the ordinary kent command line tools work with the URL ? For example, bigWigInfo should return:

bigWigInfo "https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig" version: 4 isCompressed: yes isSwapped: 0 primaryDataSize: 38,394,145 primaryIndexSize: 226,392 zoomLevels: 10 chromCount: 198 basesCovered: 520,652,859 mean: 1.871263 min: 0.050000 max: 574.520020 std: 7.786024

NullModel avatar Aug 26 '22 00:08 NullModel

Thanks @NullModel for the pointers and for clarifying that this issue is not related to kent. At some point, rtracklayer somewhat diverge from the kent source. I will look into the networking module in rtracklayer.

sanchit-saini avatar Aug 26 '22 20:08 sanchit-saini

Hi @sanchit-saini, wow, this is interesting. We had no idea that you're calling our C functions from R directly, here, for the bigWig data import: https://github.com/lawremi/rtracklayer/blob/master/R/bigWig.R#L240 (the only thing that I can't figure out is which function you're calling exactly, it's in the string/macro "bwgFile_query" but I can't find the value.

If we changed a function header and that broke something, let us know. We can give you a stable function call, e.g. bwQuery_stable, and can promise to never change the header of that. But we haven't ever touched the headers, as far as I can see from the git history, so maybe this has to do with something related to rtracklayer?

maximilianh avatar Aug 29 '22 12:08 maximilianh

The only change was

-struct bbiFile *bigWigFileOpenAlias(char *fileName, struct hash aliasHash); -/ Open up big wig file. Free this up with bbiFileClose. Use aliasHash if non-NULL */ +struct bbiFile *bigWigFileOpenAlias(char fileName, aliasFunc aliasFunc); +/ Open up big wig file. Free this up with bbiFileClose. Use aliasFunc if non-NULL */

in commit b622d147b7dbac52dbf3ba26928cd18e02d42bd8 but this is not a function you use (as far as I can tell)

maximilianh avatar Aug 29 '22 12:08 maximilianh

@braneyboo I think we've had this issue before, that whenever we change something in bigBed/bigWig, rtracklayer runs into trouble. I wonder if we can think about a system to reduce this in the future... not sure what. Maybe make separate, stable function calls that we never touch and that are marked as such in the code?

maximilianh avatar Aug 29 '22 12:08 maximilianh

Hi @maximilianh ,

the only thing that I can't figure out is which function you're calling exactly, it's in the string/macro "bwgFile_query" but I can't find the value.

Calling to this function which uses kent library functions https://github.com/lawremi/rtracklayer/blob/05017acd2b4abc934b7e8c9873bdedcc099875aa/src/bigWig.c#L226

Kent source that rtracklayer relies on resides at : https://github.com/lawremi/rtracklayer/tree/master/src/ucsc

If we changed a function header and that broke something, let us know. We can give you a stable function call, e.g. bwQuery_stable, and can promise to never change the header of that. But we haven't ever touched the headers, as far as I can see from the git history, so maybe this has to do with something related to rtracklayer?

I don't think it is related to the modification of function headers. I'm still not sure what's causing this issue and need some time to investigate it.

Apart from that, I like the idea of having stable functions. It would be great. We could do it at some time. First, we've to factor out the list of used functions.

sanchit-saini avatar Aug 30 '22 12:08 sanchit-saini

Thanks! Yes, this confirms that this is your call: struct bbiFile * file = bigWigFileOpen((char *)CHAR(asChar(r_filename)));

And we didn't change bigWigFileOpen. We made a single change 4-5 years ago to bigWigFileOpen and that broke another package. The engineer who made the most recent change confirmed that he will never touch these function signatures. I guess only you can find out what exactly broke here. We definitely made a small change, but not to any of the function headers that you're calling.

maximilianh avatar Aug 30 '22 13:08 maximilianh

Good Morning Brian:

The file appears to be completely accessible from here:

curl -I 'https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig'

HTTP/1.1 200 OK Date: Wed, 10 Aug 2022 14:43:53 GMT Last-Modified: Fri, 24 Jun 2022 11:55:23 GMT ETag: "385a3e4-5e2303eccc61b" Accept-Ranges: bytes Content-Length: 59089892 Connection: close Server: Data Science Institute ICL Strict-Transport-Security: max-age=15552001; includeSubDomains; preload;

bigWigInfo 'https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig'

version: 4 isCompressed: yes isSwapped: 0 primaryDataSize: 38,394,145 primaryIndexSize: 226,392 zoomLevels: 10 chromCount: 198 basesCovered: 520,652,859 mean: 1.871263 min: 0.050000 max: 574.520020 std: 7.786024

time bigWigToBedGraph 'https://nottlab.dsi.ic.ac.uk/RMY_cancer//cancer/hg38/A549_H3K27ac_R1_tag.ucsc.bigWig' A549_H3K27ac_R1_tag.ucsc.bedGraph

real 0m4.773s

head A549_H3K27ac_R1_tag.ucsc.bedGraph

chr1 134884 135092 0.47 chr1 135092 135228 1.43 chr1 191097 191541 0.14 chr1 191541 191677 0.97 chr1 267953 268089 2.14 chr1 526646 526782 1.43 chr1 629822 629834 2.67

You would need to find out what error exactly rtracklayer is seeing when it tries to access the file. See if you can run the kent commands on the file from your access location.

--Hiram

NullModel avatar Oct 11 '22 08:10 NullModel

@sanchit-saini have you had any luck with this?

bschilder avatar Oct 24 '22 09:10 bschilder