basenji icon indicating copy to clipboard operation
basenji copied to clipboard

"RuntimeWarning: divide by zero encountered in log" in akita_data_read.py

Open BatyrM opened this issue 2 years ago • 1 comments

  1. In https://github.com/calico/basenji/blob/master/bin/akita_data_read.py#:~:text=seq_hic_obsexp%20%3D%20np.log(seq_hic_obsexp), in line 204 of akita_data_read.py while using https://github.com/calico/basenji/blob/master/manuscripts/akita/tutorial.ipynb notebook for data preprocessing of akita_data.py, I encountered "RuntimeWarning: divide by zero encountered in log" warning . Should I ignore this warning message, or this is something strange happening? I am using 5 Hi-C files provided in https://github.com/calico/basenji/blob/master/manuscripts/akita/data/targets.txt. I removed --sample argument as was recommended in the notebook.
  2. After removing that --sample argument, sequences.bed file contains >19K lines or coordinates. Should this be like that because in the provided https://github.com/calico/basenji/blob/master/manuscripts/akita/data/sequences.bed file, number of lines are >7K? Or some sample argument was used to diminish that >19K to >7K?

BatyrM avatar Jun 13 '22 20:06 BatyrM

Hi, yes those are OK. They occur in cooltools, and the NaNs are handled lower in the script.

It it is a bit strange that you ended up with so many more sequences. Can you visualize the two BED files in a genome browser and observe how they're different?

davek44 avatar Jun 26 '22 00:06 davek44