dispy icon indicating copy to clipboard operation
dispy copied to clipboard

Dealing with custom types defined in C

Open UnitedMarsupials-zz opened this issue 7 years ago • 6 comments

I'm trying to use dispy to parallelize work with objects defined in C. The program will implement serialization of these objects (a link to an example would be most appreciated).

While trying to test this, I get an error -- presumably, because the serialization is not yet implemented. What I am seeing, however, is the secondary error -- from somewhere inside Python-3.6's inspect.py (function named getfile) complaining: TypeError('{!r} is a built-in class'.format(object)).

I'm guessing, Dispy is trying to be helpful, but needs to catch these exceptions so that they don't hide the original one.

UnitedMarsupials-zz avatar Sep 05 '18 04:09 UnitedMarsupials-zz

Is it possible to send me a small example that I can run?

pgiri avatar Sep 05 '18 04:09 pgiri

Yes - you can use the file-object as an example:

def meow(o):
	return o

def processed(status, node, job):
	if status == dispy.DispyJob.Finished:
		print("%s" % job.result)
	elif status == dispy.DispyJob.Terminated:
		print("%s" % (job.exception))
	return

if __name__ == '__main__':
	import dispy
	f = open('/dev/null', 'r')

	cluster = dispy.JobCluster(
		meow,
		cluster_status = processed,
		depends = [f]
	)
	cluster.print_status()
	cluster.wait()
	cluster.print_status()

The errors I'm getting are quite unhelpful:

2018-09-05 10:16:21 pycos - version 4.8.1 with epoll I/O notifier
2018-09-05 10:16:21 dispy - dispy client version: 4.9.1
2018-09-05 10:16:21 dispy - Storing fault recovery information in "_dispy_20180905101621"
Traceback (most recent call last):
  File "d.py", line 35, in <module>
    depends = [f]
  File "/prod/pfe/local/lib/python3.6/site-packages/dispy/__init__.py", line 2547, in __init__
    lines = inspect.getsourcelines(dep)[0]
  File "/prod/pfe/local/lib/python3.6/inspect.py", line 955, in getsourcelines
    lines, lnum = findsource(object)
  File "/prod/pfe/local/lib/python3.6/inspect.py", line 768, in findsource
    file = getsourcefile(object)
  File "/prod/pfe/local/lib/python3.6/inspect.py", line 684, in getsourcefile
    filename = getfile(object)
  File "/prod/pfe/local/lib/python3.6/inspect.py", line 654, in getfile
    raise TypeError('{!r} is a built-in class'.format(object))
TypeError: <module 'io' (built-in)> is a built-in class
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/prod/pfe/local/lib/python3.6/site-packages/dispy/__init__.py", line 2803, in shutdown
    self.close()
  File "/prod/pfe/local/lib/python3.6/site-packages/dispy/__init__.py", line 2790, in close
    if self._compute:
AttributeError: 'JobCluster' object has no attribute '_compute'

After I added the module's name to my types (following this advice), the first error is gone and instead I am getting the complain from inspect's findsource method saying OSError: source code not available.

Followed by the cryptic error about the _compute attribute...

UnitedMarsupials-zz avatar Sep 05 '18 14:09 UnitedMarsupials-zz

The issue seems to be in assuming that any object that has __class__ attribute can be used for getting source for that class (see line 2539 and 2546 in __init__.py). This can be handled in one of two ways: Either checking for __module__ attribute that I think is needed to get source, or use try/except with inspect.getsourcelines and issue appropriate warning. I will commit fix in couple of days (I have been working on a rather large patch and maintaining two branches; I am hoping to commit the other one soon so I don't have to apply patches to both branches and two Python versions!).

pgiri avatar Sep 07 '18 00:09 pgiri

usetry/except with inspect.getsourcelines and issue appropriate warning

Yes this is a bigger point: the attempts to be more helpful should not make the reported error less helpful, when they fail.

That said, some example of sending out a native custom type may in order :-) In my case, I added a todict method to my class, and a constructor that can recreate the object from a dictionary -- so now, instead of trying to pass around the objects of a native type, I'm pushing their dictionary-representations. The dictionaries are converted back into native types on each node, and the native types are then passed to the proprietary library for actual computations.

It seems to work, but I'm only learning and it would've been nice to have some kind of "best practices" tutorial for such a case...

UnitedMarsupials-zz avatar Sep 07 '18 02:09 UnitedMarsupials-zz

In my case, I added a todict method to my class, and a constructor that can recreate the object from a dictionary

Is it possible to add __getstate__ and __setstate__ methods to the class? If so, that is all that is needed to serialize and deserialize. If __getstate__ can return a dictionary, then __setstate__ is not required (unless special processing is needed). See, for example, class _DispyJob_ in __init__.py whose __getstate__ returns a dictionary with only necessary attributes (this class also defines __setstate__, although not required). If this works, then this is Python's serialization approach (look for __getstate__ in dispy).

pgiri avatar Sep 07 '18 03:09 pgiri

Is it possible to add __getstate__ and __setstate__ methods to the class?

Aha! Thanks for the pointer. Yes, certainly -- once I turned my existing todict and fromdict into __getstate__ and __setstate__ respectively, things became much nicer. And, oh, the speed so far seems linear (!) compared to using OpenMP within one machine.

I have other questions, but will use StackOverflow -- I see, there is a dispy-tag there already...

UnitedMarsupials-zz avatar Sep 07 '18 18:09 UnitedMarsupials-zz