PSyclone
PSyclone copied to clipboard
Potential PSyclone projects
This Issue is a place for us to capture ideas for potential PSyclone-related projects. The idea is that they be self-contained, fairly short-term pieces of work that might be suitable for a student to take on.
- A PSyIR-aware plugin for e.g. emacs or vim
- A GUI that allows a user to manipulate PSyIR (or even generate from scratch) c.f. Scratch/Blockly
VERY rough brainstorming only - just some key ideas, many of the ideas need some thorough fleshing out (and maybe the idea is even stupid in the first place :) )
- PSyData Applications (some of which I am interested in)
- in-situ visualisation for LFRic (probably done by Wolfgang) and Gocean/Nemo?
- Kernel Extraction for nemo(?)
- sensibility analysis: use kernel extraction to determine how small changes in a parameter affect the result (to help detecting what might create an numerical instability)
- Other profiling wrapper (score-p, Cray-pat, ...)
- Can PSyclone be combined with python? This is based on a suggestion I had from R&D - the idea was compared with the way machine learning libraries work: you specify something in python, but the actual maths is done on GPUs etc. (this idea needs a lot of work to define what exactly is required)
- The idea is that the main program and a lot of non-performance critical code is written in python, but the performance critical code is done with PSyclone+Fortran.
- python would need to be able to call infrastructure library (e.g. to create fields), and invoke-subroutines
- maybe kernels could be called in Fortran to allow for easier prototyping??? Yes, that opens a whole world of pain I guess - but might be doable with a python psyir backend???
- Comparable debugging: couple two versions of one applications ('orig' and 'new'), which would automatically compare results after each kernel. This could be useful to e.g. verify an optimised or threaded version of a kernel, or the use of a different compiler. After each kernel statistics could be collected about the result differences in one kernel. Then the 'orig' results would be transferred to the 'new' kernel and used further on (instead of the results computed in 'new' itself), i.e. each kernel would be called with identical input parameter.
- At the end we might be able to deduce statistics about which functions might be wrong (result differences outside a certain epsilon)
- which functions contribute how much to result differences
- identify kernels that introduce a dependency on the domain decomposition used?
- Can PSyclone be used to tackle the 'correctness' problem? I.e. to determine if results obtained by using a new/different compiler are still 'correct'? Maybe PSyclone could create kernels that determine an expected interval in which the output values are expected to be?
- A simple approach could be to take an input parameter I, and call the kernel with I*(1-x) and then with I*(1+x). This can be done for each input parameter I1, .. In, resulting in all in all 2^n kernel invocations - which might tell is in which range the output values are expected to be. If then the kernel compiled with the new/different compiler called with I1, ... In (ie. unmodified) is outside the range determined above, chances are there is a code or compiler bug??
- Creating unit tests for kernels? That might be reasonable easy with psyke(?)
- Kernel verification: e.g. determine that field accesses are within the specified stencil (either psyclone-time analysis, or runtime if required(??)), read-only fields are not modified, no saved variables are used, ...
- Correctness proof of psyclone transforms? Can we show that our transforms do not create any bugs?
I'm thinking about @hiker's point 3. above - "Comparable debugging". This would also be useful when porting a code to e.g. GPU as one often needs to work out at which point in a run the results diverge. In theory, the PGI compiler (well, NVIDIA now) can do this at run-time by running a kernel on both CPU and GPU. However, I found that that made things even more unstable than they were already. Some compiler-independent functionality would be nice. We could do a 'truth' run which computed and saved (to disk) a checksum for each kernel executed. We'd then have a 'timeline' of checksums in a file. We'd then be able to repeat the run with the second executable (as long as the two runs used an identical series of kernels), read in the checksums, and compare them as we go until we find a divergence.
In fact, this timeline of kernels is something that has also been discussed as a requirement for assisting in building an adjoint of a model (required for Data Assimilation).
I am actually thinking of running the apps at the same time, and communicate the results at runtime (either using MPI dynamic process functionality (might not be supported on some platforms?), or sockets ... not trivial on some supercomputers, or ... something else ... like write-to-file sigh ... but at least that would be once per kernel only). The problem when writing to disk (especially when doing full tests) is that the amount of space required can be quite prohibitive (though of course possible, e.g. for only a few kernels at a time).
Yes, I agree. It would be nice to run side-by-side but the complexity of doing that is going to be substantial. I was thinking of recording only kernel name and a checksum value so not very much data. Would still have to be careful with long runs though.
Ah yes, good point, that would work of course. Though direct communication means after a change is detected, we can send the 'correct' result to be used on the 'incorrect' side :P (for the record, I already implemented this kind of library years ago ... but I couldn't open source it at that time). Hmm - as additional potential communication methods: a DB, kafka??
Assuming we are not going to get the side-by-side working for a while, an alternative to the checksum-all-invokes option would be to gradually narrow down the region e.g. do per-timestep checks first, then major sections 2nd, then individual invokes 3rd, then kernels with full data comparisons. Of course, that would need us to know about the structure of the code somehow, probably via some infrastructure calls, which would allow us to selectively switch on checksums.
I have been thinking of some kind of ... tracing library. E.g. a runtime call-graph creation (which then could be used to visualise data dependency, e.g. you see data going into kernel 'a' is wrong, then you could 'somehow' use the call-graph data to see where 'a' was written to before ... quite vague obviously :)
As mentioned in the telco last night: I might be able to get a student to work on PSyclone for around 10 weeks. Given that it might require some initial training before this student can become productive, here a few short projects, from which the student could pick one or more. Note that suggestions assume that kernel extraction is working (since we are getting quite close, that seems like a valid assumption). For quite a few things we already have tickets.
- Improve the output of the driver so that it produces statistics (atm it prints full fields when they are not bitwise identical, which is kind of useful). Better would be: X kernels are +-1%, y kernels +-5% or so
- Support more than one input file for a kernel in the driver
- Support MPI for driver (i.e. each process write its own files, which would then need 2. above to sequentially run a kernel with all input files).
- Create a testing environment that takes
Xdriver automatically created, andX(orn*Xif we havenoutput per kernel) data files, and provides a summary output in the end: "The following N kernels have bitwise identical results, ..., the following M kernels have difference of less than 0.1, ... - I have a dummy 'tracing' library implemented: if you see a variable F having the wrong results as input (or output) to a kernel, you can lookup where this variable was written to before. The current implementation is based on the address of the variable/fields. Making this nice and shiny could be a useful tool
A lightweight debugging/profiling version of PSyData would be good - #2186. i.e. an automatic version of the most reliable and available debugging method aka WRITE(*,*).
Thinking about @hiker's original point 5: "Creating unit tests for kernels? That might be reasonable easy with psyke(?)"
In the adjoint test-harness generation we construct an algorithm that initialises the necessary fields and then uses them to run a kernel (via an invoke). This is currently flagging some issues in the original kernels (as opposed to the adjointed ones). It would therefore be good if we could generate such tests for all kernels (and build and run them in full-debug). This has the advantage that it is testing what the kernel metadata says the implementation expects. (If we use PSyKE, we are testing the kernel against inputs that the model is providing which might not match what the metadata claims is expected.)
We could also combine this test harness generation with the existing PSyclone functionality to generate code that checks that fields that are specified as READ only aren't modified by a kernel.
A lightweight debugging/profiling version of PSyData would be good - #2186. i.e. an automatic version of the most reliable and available debugging method aka
WRITE(*,*).
Just printing the values of all input and output parameters is indeed a trivial exercise. It will just need a Fortran library to be linked in (with the kernel extraction transformation ... we might need to add the ability to specify if we want to include values used from other modules)
Thinking about @hiker's original point 5: "Creating unit tests for kernels? That might be reasonable easy with psyke(?)" In the adjoint test-harness generation we construct an algorithm that initialises the necessary fields and then uses them to run a kernel (via an invoke). This is currently flagging some issues in the original kernels (as opposed to the adjointed ones). It would therefore be good if we could generate such tests for all kernels (and build and run them in
full-debug). This has the advantage that it is testing what the kernel metadata says the implementation expects. (If we use PSyKE, we are testing the kernel against inputs that the model is providing which might not match what the metadata claims is expected.)
Can you provide some additional details (maybe in the telco?)? We can likely do some static checking for gocean. Not sure about LFRic, due to its indirect addressing (and we can't assume that the content of these index array is correct?).
We could also combine this test harness generation with the existing PSyclone functionality to generate code that checks that fields that are specified as READ only aren't modified by a kernel.
Yes, checksumming code is available in the psydata libs.