basenji
basenji copied to clipboard
Reproduce the Enformer's input sequences split
I would like to regenerate the input sequences for Enformer/Basenji2 (using basenji_data.py), and for this purpose, I am using the following command line:
python basenji_data.py -g hg38.gaps.bed -u umap_k36_t10_l32_hg38.bed -b hg38.blacklist.rep.bed -l 131072 -crop_bp 8192 -break_t 786432 -s 65599 -t .1 -v .1 -w 128 -o data/input_mseqs -p 8 targets.txt
However, I am observing differences when compared to the sequences.bed file stored here
Can you please confirm if I am using the right options to generate the same sequence split?
Hi Sara, can you say a little more about your goal? It'll influence how I can best help. It'd be a little tricky for me to track down the exact parameters and basenji_data.py has changed over the years. Is it OK if the recipe is equivalent in quality, but different due to minor tweaks and random number seeds?