WPS icon indicating copy to clipboard operation
WPS copied to clipboard

Cannot clean up work directory after process

Open nuest opened this issue 10 years ago • 12 comments

There is the following issue with WPS4R, but imho this can also be the case for other processing backends: After the process run, I want to delete the working directory. But if the client has requested files as reference, then the WPS keeps a lock on these files to store them as output data.

Because the Java process keeps a lock on the files I cannot properly clean up after myself. Do we need an extension of IAlgorithm to "clean up" after processes?

The file is released after line https://github.com/52North/WPS/blob/96209e516be9a2a2a443f51a6e5f31692b47b37d/52n-wps-server/src/main/java/org/n52/wps/server/response/ExecuteResponseBuilder.java#L327 which is called via https://github.com/52North/WPS/blob/96209e516be9a2a2a443f51a6e5f31692b47b37d/52n-wps-server/src/main/java/org/n52/wps/server/request/ExecuteRequest.java#L694

Imho we could add a call to IAlgorithm.afterRun() in the finally call here https://github.com/52North/WPS/blob/96209e516be9a2a2a443f51a6e5f31692b47b37d/52n-wps-server/src/main/java/org/n52/wps/server/request/ExecuteRequest.java#L740

Alternatively, the algorithm instance could start a watching thread to delete the directory, but I don't really like that...

Or am I missing something? @bpross-52n @matthias-mueller

nuest avatar May 15 '14 10:05 nuest

I had a similar issue with the moving code module. (I guess I put the delete method in the destructor then.) A clean solution would be to provide a result folder in the WPS framework where the data is moved to from the temporary workspaces.

matthias-mueller avatar May 15 '14 10:05 matthias-mueller

Yes, a specific output folder could also work, but this could introduce considerable overhead when large files must be copied.

Which destructor do you mean exactly?

nuest avatar May 15 '14 10:05 nuest

protected void finalize(){}

The overhead could be reduced to almost zero if the file is not copied but moved. Shoud take no more than a few milliseconds if its the same disc.

matthias-mueller avatar May 15 '14 11:05 matthias-mueller

Ok... I'll try the finalize - though I might have actually just have refactored my code to make this harder, since I allow to cache the algorithm class instances.

Moving is a good point. Then I must still communicate that "output directory" to the Algorithm somehow, right? Would also mean that the IAlgorithm-API is extended, or not? Should this go into AbstractAlgorithm ?

And regarding the speed of the moving - I guess that depends if you stream it through Java, right? Only Java 1.7's Files.move() would use the operating system operation and be really fast, a workaround could be to use .renameTo() until we are finally using 1.7... (see http://openbook.galileocomputing.de/java7/1507_05_003.html#dodtp27aa8d91-3ce4-47dd-a277-5538dbaaa2a6)

nuest avatar May 15 '14 11:05 nuest

We could also move that logic to the GenericFileGenerator and let him move the outputs to a server-wide result folder.

(What prevents us from moving to Java 7 ASAP?)

matthias-mueller avatar May 15 '14 12:05 matthias-mueller

Nothing keeps us from moving to Java 7. In fact, I am working on migrating the GSoC 2013 config-api to the master and therefore I had to switch to Java 7. Not sure when the branch will be ready to merge though.

bpross-52n avatar May 15 '14 13:05 bpross-52n

Ok - so Java 1.7 is somewhat set, I opened #89 to discuss anything related to that.

There already is a "server wide result folder", the output file database. I'm not sure moving this to the generic file generator solves the issue, but I don't see how that would work yet.

Are there any arguments agains a "afterRun" method in IAlgorithm (besides changing the API?) ?

nuest avatar May 15 '14 14:05 nuest

That logic should remain in the output database. It just might make sense that file-based outputs (which are handled by the GenericFileGenerator) are directly forwarded to it. Hence my suggestion. Not sure what an after run method should comprise. In essence, the algorithm still has to take care of its own resources, right? So it might as well exit as clean as possible. One thing we need to think about about is what happens to the occupied resources when the Algorithm class suddenly crashes. (In the case of a crash both the finalize() and the afterRun() are possibly affected an my thus not be able to free all resources.) The best thing would be to have some kind of global resource monitor in the framework to which algorithms could send information about occupied resources. Then this supervisor could keep track of all the persistent garbage. Besides a global resource monitoring we also get the ability to persist the resource consumption database and do a clean up after a severe crash at the next WPS startup. Phew, quite a long text!

matthias-mueller avatar May 15 '14 17:05 matthias-mueller

Crashes aside: The algorithm is not aware that there is an "output database". Should we change that? In my case the algorithm cannot clean up after itself (it is aware of it's own resources) because the resources are blocked by the WPS so that it can store them in the database.

Maybe we should formulate the algorithm lifecycle from the perspective of the WPS to get further here? Right now the lifecycle is... "new" - "running" - "done". Is that enough? Do you agree this would help?

nuest avatar May 16 '14 10:05 nuest

So the issue are the shared resources: There are resources that are owned by the WPS and there are resources that are owned by the algorithm. Resources owned by the algorithm cannot be easily transferred to the WPS if the algorithm communicates with a proprietary backend (ArcGIS, Grass, R, ...). Giving the Algorithm awareness of an output database would make algorithm design messy.

We have not yet discussed the Repository layer, which mediates between the framework and the algorithm. We could put a dispose method in there: IAlgorithmRepository.dispose(String algoInstanceID)

matthias-mueller avatar May 16 '14 11:05 matthias-mueller

The class ExecutionContext was introduced to facilitate this kind of work, it's created before an execution in the ExecutionRequest.response() and cleaned up in the finally clause. The idea was to generate resources like temp directories, etc. that would be deleted at the end of the execution.

tkunicki avatar May 17 '14 04:05 tkunicki

ExecutionContext looks just like what I need and should use - thanks for the pointer!

The question that I will look at (soon...) is now how I can extend or change things in the context from WPS4R, because the work directory handling is actually not simple right now. Ideally the process could "register" resources/directories in the context that it is executed in that it cannot clean up itself.

nuest avatar May 19 '14 12:05 nuest