sim3C
sim3C copied to clipboard
References without cut-sites should still produce spurious read-pairs and not just excluded from simulation.
I've been having some trouble simulating HiC reads, and after an hour of troubleshooting I think I've identified the issue.
This is the command I've been running, and the error I've been running into.
sim3C --dist uniform -n 10000 -l 150 -e Sau3AI -m hic --profile-name ${genome}_simhic_profile.tsv $genome.fasta ${genome}_simhic.fastq
ERROR | 2020-08-25 02:09:05,237 | main | 'Seq' object has no attribute 'id'
Traceback (most recent call last):
File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/command_line.py", line 213, in main
args.num_pairs, args.method, args.read_length, **kw_args)
File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/simulator.py", line 307, in __init__
create_cids=create_cids, linear=linear)
File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/community.py", line 507, in __init__
random_state, create_cids, linear))
File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/community.py", line 82, in __init__
self.sites = CutSites(enzyme, seq.seq, self.random_state, linear=linear)
File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/site_analysis.py", line 63, in __init__
raise NoCutSitesException(template_seq.id, str(enzyme))
AttributeError: 'Seq' object has no attribute 'id'
I believe the problem is that template_seq
does not have an id
method. Using type()
on template_seq
identifies it as a Bio.Seq.Seq
object.
I've removed the sequences that were causing the issue and am now able to run the program, but this bug meant I was not able to easily identify which sequences did not have cut sites.
Hi Mustafa,
Without looking, this sounds suspiciously like a change in the Biopython API. There has been two attributes which contain the same value Bio.Seq.name and Bio.Seq.id http://bio.seq.id/. It might be that .id has finally been dropped. Just a guess for now. There should be a fix with some version pinning to avoid this — if my suspicion is correct.
On that note, could you provide the output of pip freeze
?
On 25 Aug 2020, at 7:22 pm, Mustafa-Albekaa [email protected] wrote:
I've been having some trouble simulating HiC reads, and after an hour of troubleshooting I think I've identified the issue.
This is the command I've been running, and the error I've been running into.
sim3C --dist uniform -n 10000 -l 150 -e Sau3AI -m hic --profile-name ${genome}_simhic_profile.tsv $genome.fasta ${genome}_simhic.fastq
ERROR | 2020-08-25 02:09:05,237 | main | 'Seq' object has no attribute 'id' Traceback (most recent call last): File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/command_line.py", line 213, in main args.num_pairs, args.method, args.read_length, **kw_args) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/simulator.py", line 307, in init create_cids=create_cids, linear=linear) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/community.py", line 507, in init random_state, create_cids, linear)) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/community.py", line 82, in init self.sites = CutSites(enzyme, seq.seq, self.random_state, linear=linear) File "/home/mustafa/.local/lib/python2.7/site-packages/sim3C/site_analysis.py", line 63, in init raise NoCutSitesException(template_seq.id, str(enzyme)) AttributeError: 'Seq' object has no attribute 'id' I believe the problem is that template_seq does not have an id method. Using type() on template_seq identifies it as a Bio.Seq.Seq object.
I've removed the sequences that were causing the issue and am now able to run the program, but this bug meant I was not able to easily identify which sequences did not have cut sites.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cerebis/sim3C/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN2PC5CLOPISFMOTWAYHZTSCN7GJANCNFSM4QKNWZRA.
Well ignore what I just said, seems this is entirely a bug in sim3C.
Hello Matthew,
I hope this will be fixed soon! Sim3C is very useful and been quite easy to use.
Output for pip freeze, in case you still need it, is:
biopython==1.76
BUSCO==3.1.0
certifi==2019.11.28
enum34==1.1.10
funcsigs==1.0.2
iced==0.4.2
intervaltree==3.0.2
llvmlite==0.31.0
numba==0.47.0
numpy==1.16.6
PyYAML==5.3.1
scipy==1.2.3
sim3C @ git+https://github.com/cerebis/sim3C@43e2ccfabf55f9ddb84754e9b29b8791d4bd34c0
singledispatch==3.4.0.3
six==1.15.0
sortedcontainers==2.2.2
tqdm==4.45.0
I have committed a fix to handle this issue (9830b3c0b0a4f50e90922c3cbf061dbb076d72a6).
Unfortuntely, this will perhaps not be the logic you are hoping to see. Reference sequences which do not contain a cut-site will be ignored in the simulation, and if a cell contains only that replicon, it too will be ignored.
Regarding how sim3C simulates Hi-C reads, a sequence which contains no cutsites will not produce a read-pairs with proximity ligations. It would however, still be capable of spurious read-pairs (noise). I will leave this issue open, but modify the title to reflect that this should be addressed in future.