pywps icon indicating copy to clipboard operation
pywps copied to clipboard

Make sure PyWPS objects are serializable

Open huard opened this issue 2 years ago • 5 comments

Description

Parallelisation libraries, like dask, communicate processes from the scheduler to workers by serializing-deserializing objects through the network. It seems that some PyWPS objects are not serializable. The issues I've found so far are:

  • Input and Outputs in the Process objects include weak references (weakref in the IO handlers)
  • WPSRequest has an EncodedFile object that is not pickable.

I propose to start by writing tests that try to pickle PyWPS objects, submit a PR, and pursue the discussion over there.

Environment

  • operating system: Ubuntu
  • Python version: 3.9.7
  • PyWPS version: 4.5.2
  • source/distribution
  • [x] git clone
  • [ ] Debian
  • [ ] PyPI
  • [ ] zip/tar.gz
  • [ ] other (please specify):
  • web server
  • [ ] Apache/mod_wsgi
  • [ ] CGI
  • [ ] other (please specify):

Steps to Reproduce

Additional Information

huard avatar May 13 '22 15:05 huard

While investigating this, I realized that the Process.json returns dict, while WPSRequest.json returns a string. the former has a from_json method, while in the second, json is a property with getter and setter methods.

Is this something that should be uniform across the code?

huard avatar May 17 '22 17:05 huard

Another more serious issue is that Process._run_process, the method actually running the process handler, triggers Process.launch_next_process, which runs Service.prepare_process_for_execution. So individual processes need a reference to the overall service, which complicates the serialization of Processes. I'm not sure I can solve this one without falling into a refactoring nightmare. Ideas ?

huard avatar May 18 '22 15:05 huard

Hello huard,

While investigating this, I realized that the Process.json returns dict, while WPSRequest.json returns a string. the former has a from_json method, while in the second, json is a property with getter and setter methods.

Is this something that should be uniform across the code?

I also noticed the different behavior of json properties across the code and I did addressed the issue in some of my refactoring such as [1].

I think this should be fixed.

[1] https://github.com/gschwind/PyWPS/commit/db2738732e25787fd02f6e921671752a93d7c866

gschwind avatar Oct 09 '23 12:10 gschwind

Hello huard,

Another more serious issue is that Process._run_process, the method actually running the process handler, triggers Process.launch_next_process, which runs Service.prepare_process_for_execution. So individual processes need a reference to the overall service, which complicates the serialization of Processes. I'm not sure I can solve this one without falling into a refactoring nightmare. Ideas ?

I do also agree that is quite an issue, but refactoring this is very difficult at the moment.

Best regard.

gschwind avatar Oct 09 '23 12:10 gschwind

Hello,

Moreover the json serialization is used in different context with very different meaning, the serialization may end up as json outputs for json request, may be used within XML templates or may be used to serialize data to the data base.

We should clarify the intend of json serialization and try to keep it.

gschwind avatar Oct 09 '23 12:10 gschwind