hgraph2graph
hgraph2graph copied to clipboard
Getting error while generating vocabulary
Hello Wengong !
Thanks for the great work !!
I am trying to get vocabulary using your dataset < ../data/polymers/all.txt >
; however, I am getting this error. I cannot figure this out. At the end I tried try-exception there but there are lots of these errors in the whole run. I will appreciate if you could assist me.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "[...]\hgraph2graph-master\hgraph2graph-master\generation\get_vocab.py", line 12, in process
hmol = MolGraph(s)
File "[...]\hgraph2graph-master\hgraph2graph-master\generation\poly_hgraph\mol_graph.py", line 29, in __init__
self.clusters, self.atom_cls = self.pool_clusters()
File "[...]\hgraph2graph-master\hgraph2graph-master\generation\poly_hgraph\mol_graph.py", line 87, in pool_clusters
**if fsmiles not in MolGraph.FRAGMENTS: continue**
TypeError: argument of type 'NoneType' is not iterable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "[...]/hgraph2graph-master/generation/get_vocab.py", line 62, in <module>
vocab_list = pool.map(process, batches) # getting error here TypeError: argument of type 'NoneType' is not iterable
File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 644, in get
raise self._value
TypeError: argument of type 'NoneType' is not iterable
Hi,
It seems that MolGraph.FRAGMENTS is not initialized. In https://github.com/wengong-jin/hgraph2graph/blob/3249f93c6e72a3cdfb0c0a71939b0f071dfe7456/generation/get_vocab.py#L50, the load_fragment function will set MolGraph.FRAGMENTS to a list of fragments collected from your training data.
This is strange because as long as load_fragment is called (get_vocab.py Line 50), MolGraph.FRAGMENTS cannot be None (at best it's an empty list). I think the error happened before load_fragment function is called. You can try to print out fragments variable in line 49 to see whether it gets executed or not.
I too was getting the same error in generation/preprocessing.py file. I debugged the code step by step and found the issue that when the program calls partial(tensorize, mol_batches), the MolGraph.tensorize initializes the FRAGMENTS to None and never calls load_fragments before going to pool_clusters() leading to this NoneType iterable issue. Please help with this if I am wrong. Thank you in advance
@nikhilmittal444 This is an issue with Pool in Windows. MolGraph.FRAGMENTS is not accessible in functions called through Pool. I removed the multiprocessing and I am able to get the vocabulary without any issue. However, I am only getting 2273 lines in contrast to 2288 lines in the provided vocab. @wengong-jin I am still going through the code to see if there is any randomness. However, do you think this is normal?
I made the FRAGMENTS from load_fragments as a new variable and put that as input argument to the tensorize function(self.new_variable) and the MolGraph object in the init(), which gave me 2288 lines as initialized. It also resolved the MolGraph.FRAGMENTS not iterable as NoneType object
Hi guys,
this issue could be easily solved by simple replacement of None
with an empty list (see #15).
@wengong-jin, please review if this change seems to be safe.
Hi gurus,
please
, I need your help. I am trying to run the get-vocab.py on my small dataset around 100. but keep getting this error as shown below:
Is there a way to go around this. the reference for the error is to the mol_graph.py line82:
"assert n - m <= 1 #must be connected"