FactorNet
FactorNet copied to clipboard
KeyError - Missing Chromosome name
Hi,
I have been trying to use FactorNet on Drosophila data but to no avail. I have been getting the following error:
python FactorNet-master/train.py -i Training/TrainingScores_chr3R.bed -vi Validation/Validation_Peaks_chr2R.bed -k 128 -r 128 -d 256 -oc Training_output
Multi-task training
output directory (Training_output) already exists so it will be clobbered
Loading genome
Traceback (most recent call last):
File "FactorNet-master/train.py", line 301, in <module>
main()
File "FactorNet-master/train.py", line 190, in main
genome = utils.load_genome()
File "/home/pm16057/FactorNet/FactorNet-master/utils.py", line 349, in load_genome
onehot_chroms = parmap.map(get_onehot_chrom, chroms)
File "/usr/lib/python2.7/site-packages/parmap/parmap.py", line 304, in map
return _map_or_starmap(function, iterable, args, kwargs, "map")
File "/usr/lib/python2.7/site-packages/parmap/parmap.py", line 248, in _map_or_starmap
output = result.get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
KeyError: 'chr2RHet'
From my understanding of the problem this is due to missing names is some python dictionary.
I was wondering if it could be that some dependency does not support chromosome names from other species or if there is a way around this issue that I have not been able to find?
I have changes files in the resources folder and made changes to utils/.py
but to no avail.
Thank you!
Kind Regards
Patrick
Hi Patrick,
Thank you for your interest in FactorNet. Unfortunately, I have left academia and I am no longer supporting this package. However, I have been developing new packages, GenomeLoader and PillowNet, which serve as successors to FactorNet. Unlike FactorNet, which uses a CNN, PillowNet uses a U-Net, which in my opinion is a better way of modeling sequence data. I haven't really been updating the documentation for the Github repositories, since I've been busy getting them working on my new job's platform (I work for DNAnexus, btw. Our cloud platform makes desktops and HPCs for research almost obsolete. GPUs are so easy with this). The repositories already install pretty easily on our platform, although they should install just as easily on a desktop or properly maintained HPC cluster. Let me know if you'd like to discuss getting them working for you. The repositories are found here:
https://github.com/daquang/GenomeLoader https://github.com/daquang/PillowNet
-Daniel
Hi,
a real pitty FactorNet is not maintained anymore.
Will there be a paper/extensive README for PillowNet?
As we also struggle quite a bit with FactorNet, might be worthwhile to try out PillowNet.
Would you suspect that both tools should yield similar results given the same input data?
Thanks and best, Johann
EDIT: about DNAnexus, I suspect this is not freely available is it? So use in academia might be really limited?
I actually wrote PillowNet while I was in my postdoc under the MIT license, so it is freely available. I expect the results are not too different from FactorNet, and hopefully they'll actually be better.
Originally PillowNet was going to have its own paper, but I left academia too early. It was used in my last paper, and you should cite that paper: https://www.sciencedirect.com/science/article/pii/S2212877819309573
I should definitely update PillowNet with a proper README, especially if there's a demand for it.
Thanks for the replies! We'll have a look at it. Quick question: FactorNet, as far as I know, uses pure DNA sequence in addition to other signal information. Is this also done in PillowNet (or in the paper you pointed to)?
cheers