ganga
ganga copied to clipboard
Developer documentation
I'm trying to migrate some custom-written applications from Ganga 6.0.17
(!) to Ganga 8.5.0
. I'm having problems working out what needs to go into the application prepare()
method, introduced between the two versions. Is there any documentation, or could some be written, to help with this?
I've had a look at the documentation linked from https://ganga.readthedocs.io/en/latest/, but haven't found anything relevant. I've also looked at the code and comments in IPrepareApp.py and Executable.py ,but I'm struggling to understand the logic :(
Thanks!
Hello,
I don't think there is any documentation aimed at developers but maybe there should be - or at least the docstring made clearer.
As far as I can recall I don't think much changed with regards application preparation between ganga 6 and ganga 8 so hopefully the changes you need to make are minimal.
The major changes were to the directory structure (everything moved down one inside a new ganga folder, i.e. GangaCore
-> ganga/GangaCore
so this requires a modification to any imports) and the upgrade to python3.
For the prepare()
method this gets called at submission. It gathers together everything to make the application run. This includes copying the inputfiles into the sharedir
and turning them into a sandbox that can be shipped to the worker node. The point of the sharedir
is that it includes everything that you would want if you copied the job into a new job (a copied job has the same sharedir so would reuse those files) and keeps a copy of those files to replicate the application state when it was submitted. If your application needs compiling or configuring this also happens in the prepare()
.
Feel free to ask any specific questions here and we can maybe add the explanations to the documentation.
Feel free to share your application with us as well. We can then have a look at what is done and make comments. You may want to look at the PrimeFactorizer example as well. In fact the whole GangaTutorial package is a good example of implementing a plugin for Ganga. The example below starts Ganga with the tutorial package enabled and you can see that the PrimeFactorizer
application is available.
[egedelt] ~/programming/ganga % ganga -o '[Configuration]RUNTIME_PATH=GangaTutorial'
*** Welcome to Ganga ***
Version: 8.4.5 - DEV
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.
This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.
INFO reading config file /home/egede/.gangarc
[15:23:16]
Ganga In [1]: plugins()['applications']
Ganga Out [1]: ['Executable', 'Root', 'Notebook', 'PrimeFactorizer']
[15:23:19]
Ganga In [2]:
Thanks for the replies, and apologies for my slow follow-up.
Checking the code, I see that application preparation was indeed already implemented in Ganga 6.0.17
, but a change since then is that the Job
class expects application classes to define a prepare()
method and an is_prepared
property.
The simplest recipe that I've found found for migrating to Ganga 8.5.0
an application inheriting from IApplication
that worked in Ganga 6.0.17
is:
- Change base class from
IApplication
toIPrepareApp
. - Add to schema items
is_prepared
andhash
. - Make
prepare()
andunprepare()
methods of base class available by adding class property:
_exportmethods = ['prepare', 'unprepare']
After that the application works as previously.
In terms of documentation, there's some explanation of the purpose of the prepare()
method in: Job.prepare() and in Executable.prepare(), but it would be great to have some developer-oriented information in IPrepareApp(), along the lines of the excellent documentation provided in IApplication.
Some specific questions that I'd have are:
- What are the use cases where the
prepare()
method is useful? Is it mainly relevant to distributed systems, or can it also bring advantages when running on a cluster with a shared file system? - How are files copied to the shared directory accessed at run time? Are they copied to each subjob's work directory on the worker node, or is there a different mechanism?
- What's the purpose of the
hash
property?
Thanks, Karl.
Hi Karl,
Good suggestion to expand the documentation in IPrepareApp
. Let me try to answer your specific questions.
What are the use cases where the prepare() method is useful? Is it mainly relevant to distributed systems, or can it also bring advantages when running on a cluster with a shared file system?
- Reduce the amount of times that the same files are uloaded for distributed systems.
- Save time if for example a time consuming compilation is done as part of the prepare step to produce a set of shared libraries.
When you copy a job, the shared area will be kept. So you can copy a job but still use the same (already prepared) application object. This time maybe with a different set of data files.
How are files copied to the shared directory accessed at run time? Are they copied to each subjob's work directory on the worker node, or is there a different mechanism?
This depends on the runtime handler. In general they are copied when the job starts. For the Dirac
backend the files are uploaded to cloud storage and then picked up from there when the jobs run.
What's the purpose of the hash property?
There is a hash
calculated over the content in the shared directory. This is then checked at submission time to make sure that no changes have been made to the shared area since it was prepared
. Really just a safety to protect against something unexpected happening.
Ulrik.