GENOVA
GENOVA copied to clipboard
Requesting for test data
Hello, Thank you for developing this great package. I installed it using the command below. install.packages("remotes") remotes::install_github("robinweide/GENOVA")
I am wondering where I can find the data described in the vignette.
As a test, I was trying to run your data (e.g., WT_10000_iced.matrix, WT_10000_abs.bed, CTCF_WT_motifs.bed, etc.), but I could only find 'hg19_cytobandAcen.bed' under data/. I also searched GSE95015 and GSE160490, but I could not find the data with the same name.
PS. When I run my data, I get error messages as below. I was using HiC-Pro output (iced normalization). So, I like to see if GENOVA runs well with your data in my system and also I like to see the format of your data.
chr_mat <- chromosome_matrix(PEF_1Mb) visualise(chr_mat) Error in seq.default(.limits[1], .limits[2], length.out = guide$nbin) : 'from' must be a finite number
RCP_out = RCP(list(PEF_40kb, IVF_40kb), bedlist = list("CTCF" = CTCF), chromsToUse = 'chr2') Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 1531167683 rows; more than 68603004 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
CS_out = compartment_score(list(PEF_40kb, IVF_40kb),
bed = H3K27acPeaks)
visualise(CS_out, chr = "chr2")
Error in [.data.table(idx, , CJ(V1 = V4, V2 = V4), by = list(chr = V1)) :
negative length vectors are not allowed
Hello there,
Github is a nice place to store code, but not to store and host multiple gigabytes of version-controlled data.
The data used for the vignette is from a subseries of GSE95015, namely GSE95014. You can download the valid pairs files (these are mapped against hg19 with HiC-Pro v2.7.7), but they'd have to processed to different resolutions of Hi-C matrices.
The data for the Hap1_WT_10kb object in the vignette comes from GSE95014_Hap1.validPairs.txt.gz, the Hap1_WAPL_10kb object comes from a merge of GSE95014_WaplKO_1.14.validPairs.txt.gz and GSE95014_WaplKO_3.3.validPairs.txt.gz and the Hap1_SCC4_10kb data comes from GSE95014_SCC4KO.validPairs.txt.gz. The other objects in the vignette stem from the same data aggregated at different resolutions.
A small sample of preprocessed data for the wildtype sample is available with the get_test_data() function. The important components are the MAT and IDX data.tables.
library(GENOVA)
exp <- get_test_data("150k", download = TRUE)
head(exp$MAT)
#> V1 V2 V3
#> 1: 18619 18619 20734.3955
#> 2: 18619 18620 8296.8487
#> 3: 18619 18621 956.5607
#> 4: 18619 18622 124.3649
#> 5: 18619 18623 337.0586
#> 6: 18619 18624 154.3675
head(exp$IDX)
#> V1 V2 V3 V4
#> 1: chr21 0 150000 18557
#> 2: chr21 150000 300000 18558
#> 3: chr21 300000 450000 18559
#> 4: chr21 450000 600000 18560
#> 5: chr21 600000 750000 18561
#> 6: chr21 750000 900000 18562
Created on 2021-07-03 by the reprex package (v1.0.0)
I'm not yet familiar with the bugs you're reporting. It would be great if we could reproduce this bugs with the test data, so that we could debug these. Can I ask what version of Hi-C Pro you are using?
Best, Teun
Dear Teun,
Thank you for your reply. I will try to check and process those data. The format of MAT and IDX is observed in my data as below.
1 1 45468.228423 1 2 10115.775160 1 3 1177.746885 1 4 318.809705 1 5 386.754051
chr1 0 1000000 1 chr1 1000000 2000000 2 chr1 2000000 3000000 3 chr1 3000000 4000000 4 chr1 4000000 5000000 5
I am using HiC-Pro 3.0.0.
My data actually came from GSE153450. After I ran HiC-Pro, I obtained matrices started from 0, and I could not run these matrices with GENOVA (these matrix issues are described in https://github.com/nservant/HiC-Pro/issues/416).
So, I ran ice normalization on raw data (e.g., PEF_rep1_10000.matrix) using the following command, and I could get matrices started from 1 (--base 0 option actually produced matrix started from 1). GENOVA accepted these matrices. Also, I merged two replicates to make one matrix, and GENOVA also accepted this.
ice -r PEF_rep1_10000.iced.matrix --max_iter 100 --filter_low_counts_perc 0.02 --filter_high_counts_perc 0 --eps 0.1 --base 0 PEF_rep1_10000.matrix
Using those data, I could obtain experiment-objects (e.g.., "PEF_40kb" for merged one). I could run some of the GENOVA's commands, but I could not run some commands (e.g., compartment_matrixplot).
== Below are example error messages == The compartment_score did not work for PEF_40kb and IVF_40kb, and compartment_matrixplot produced the error message. On the other hand, compartment_score worked for PEF_1Mb and IVF_1Mb and I could run compartment_matrixplot, but the plot had just one color.
CS_out = compartment_score(list(PEF_40kb, IVF_40kb), bed = H3K27acPeaks) visualise(CS_out, chr = "chr2")
Error in [.data.table(idx, , CJ(V1 = V4, V2 = V4), by = list(chr = V1)) :
negative length vectors are not allowed
In addition: Warning messages:
1: In newwidths > 2 * widths :
longer object length is not a multiple of shorter object length
2: In compartment_score(list(PEF_40kb, IVF_40kb), bed = H3K27acPeaks) :
Centromeres of experiments are probably incompatible.
compartment_matrixplot( exp1 = PEF_40kb, exp2 = IVF_40kb, CS_discovery = CS_out, chrom = "chr2", arm = "p", colour_lim = c(0, 15) )
Error: Sample names in the IS_discovery do not match contacts object
Thanks again! Jinsoo
Maybe, it would be better to upload the test data in Zendo or OSF(https://osf.io/)?