ABC-Enhancer-Gene-Prediction icon indicating copy to clipboard operation
ABC-Enhancer-Gene-Prediction copied to clipboard

Running predict.py: VCobserved doesn't exist, but the program keeps looking for it

Open blyzyrdynn opened this issue 3 years ago • 4 comments

Hello, Great program (if I could get it to work)! I am trying to run predict.py after running hic steps for juicebox data (juicebox_dump.py, compute_powerlaw_fit_from_hic.py). The steps to get neighborhoods and all previous steps ran smoothly.

predict.py keeps looking for VCobserved.gz files even when I define the --hic_type juicebox. Editing hic.py to force allow_vc to false didn't help. I have pasted parameter text file below, as well as a pastebin link of the full run + error messages. How can I force the program to bypass the absence of VCobserved files?

Any help is appreciated. Thanks!

enhancers abc/peaks_macs2/Neighborhoods/EnhancerList.txt genes abc/peaks_macs2/Neighborhoods/GeneList.txt outdir abc/Predictions/ window 5000000 score_column ABC.Score threshold 0.02 cellType U2OS chrom_sizes genomes/hg38/hg38/hg38.chrom.sizes HiCdir abc/hic/ hic_resolution 5000 tss_hic_contribution 100 hic_pseudocount_distance 1000000.0 hic_type juicebox hic_is_doubly_stochastic False scale_hic_using_powerlaw True hic_gamma 0.87 hic_gamma_reference 0.87 run_all_genes False expression_cutoff 1 promoter_activity_quantile_cutoff 0.4 make_all_putative True use_hdf5 False tss_slop 500 chromosomes all include_chrY False

https://pastebin.com/xQV9U8hE

blyzyrdynn avatar Jun 02 '21 01:06 blyzyrdynn

Could you change allow_vc to False in the header of this function and see if that helps?

https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/blob/b6f23f3118216d721bd4abb19fec21e6fa38fdd5/src/hic.py#L6

If so, maybe we can make the logic in this function less strict.

thouis avatar Jun 02 '21 14:06 thouis

Hello Thouis, Thanks for getting back go me. I tried editing the hic.py file and got the same error. I am also getting an error without inputing HiC data. (ABC Score KeyError). Any other suggestions to editing the code? Perhaps there are some single command lines I could run in lieu of the script?

Traceback (most recent call last): File "abc/ABC-Enhancer-Gene-Prediction/src/predict.py", line 154, in main() File "abc/ABC-Enhancer-Gene-Prediction/src/predict.py", line 123, in main all_positive = all_putative.iloc[np.logical_and.reduce((all_putative.TargetGeneIsExpressed, all_putative[args.score_column] > args.threshold, ~(all_putative['class'] == "promoter"))),:] File "anaconda3/envs/abcenv/lib/python3.7/site-packages/pandas/core/frame.py", line 2902, in getitem indexer = self.columns.get_loc(key) File "anaconda3/envs/abcenv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc raise KeyError(key) from err KeyError: 'ABC.Score'

blyzyrdynn avatar Jun 03 '21 21:06 blyzyrdynn

I think without HiC, you'll need to switch it to use powerlaw.Score via --score_column powerlaw.Score

thouis avatar Jun 03 '21 21:06 thouis

Hello Thouis, Thanks for the help. I have tried the suggestions and they didn't work, but I have managed to get the program to work with the consensus average HiC data files. I suspect there was an issue with my user supplied chr9 contacts since the KR file size was extremely small. Still got less than 2 average enhancers per gene, so I plan to figure out if I can do some normalization to fix this. However, it seems that the package itself is working.

blyzyrdynn avatar Jun 07 '21 21:06 blyzyrdynn

We've revamped the codebase. Please check out https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/tree/main and reopen your issue if it still exists

atancoder avatar Dec 07 '23 23:12 atancoder