probability
probability copied to clipboard
Contributing Faddeeva (complex erfcx)
We would be interested to have the Faddeeva function available (erfcx for complex arguments) and would like to implement and then contribute it (as we will have a GSoC student).
So the questions
- is there already any work done in this direction?
- is that contribution to TFP possible? any specific ideas/requirements?
- We have
tfp.math.erfcx
, but that's meant for real arguments and probably not much code could be shared there. - Yep that sounds good. We have code for special functions in https://cs.opensource.google/tensorflow/probability/+/master:tensorflow_probability/python/math/, so it could go there. There's uses for Fadeeva for some standard normal -adjacent integrals, so I think that such a special function could be in TFP. We generally haven't explicitly supported complex dtypes, but that should be fine.
Great, thanks!
- yes I've noticed, that brings up the question: should we extend the functionality of it? My thought is a separate (private) implementation and "dispatch" to either the current implementation if dtype is float or to the new one if dtype is complex.
- thanks, yes that's where I would have put it. Maybe it's worth adding it to special as scipy does?
-
Yeah separate and dispatching sounds fine. You can use https://cs.opensource.google/tensorflow/probability/+/master:tensorflow_probability/python/internal/dtype_util.py;l=145 to determine the complex dtype (this is mainly to be backend independent, so that this will work in JAX).
-
Adding to our
special.py
file is fine. We don't have atfp.math.special
subspace, so adding directly totfp.math
sounds fine (or you can just reuse thetfp.math.erfcx
symbol, and then dispatch to the complex and floating point impls).
Great, we will do that, I'll keep you posted (it will take a while as it is a summer student)
Any updates on this?
Related: https://github.com/tensorflow/tensorflow/issues/35617
Indeed, we are working currently with a GSoC student on the implementation and are short of a first fully functioning version. I expect a PR in a few weeks and a finalized implementation in ~2 months.
Since you're interested in it, I make sure to post the PR and am glad about feedback!
Thanks!
Hey, so the GSoC is finished and @aman027 has implemented two versions that can be found here. However we found that the performance of the function is inferior to SciPy, for larger data and on GPU it's a bit faster, but on CPU it's always clearly slower.
While this is still an impressive result given that the SciPy implementation is top speed, we think that the best course of action is maybe to actually add a kernel to TF if that get's accepted (not possible to contribute a kernel to TFP I guess?).
Also, the current implementation has no (working) gradient.
What would you suggest are the way forward?
Hi,
Are there a set of benchmarks to run against? I'd be interested in seeing what sort of data you were running against / how the benchmarking is done.
I can take a look and see if there are low-hanging fruit in code changes that could lead to speed ups. From a cursory glance, one thing you might want to try is using @tf.function(autograph=False, jit_compile=True)
to enable JIT compiling (this will compile with XLA) which should improve performance after the first run.
I think after taking a look at the code to see if there are some easy ways to optimize, we can decide whether this should be a TF C++ kernel vs. in TFP (Adding a Kernel to TF may require more work since it's often preferred to have a C++ impl which means a TF and an XLA version. Adding to TFP would mean you get to check in python code and it would be available in JAX too!)
Re: Gradient: I believe the Faddeeva derivative can be implicitly defined in terms of the Faddeeva function so that should hopefully be pretty easy to add on top.
Hi, Are there a set of benchmarks to run against? I'd be interested in seeing what sort of data you were running against / how the benchmarking is done.
We tested out for different numbers where the runtime for smaller numbers is quite different for smaller/larger numbers. It is somewhat summarised in the repository of of Aman.
I can take a look and see if there are low-hanging fruit in code changes that could lead to speed ups. From a cursory glance, one thing you might want to try is using
@tf.function(autograph=False, jit_compile=True)
to enable JIT compiling (this will compile with XLA) which should improve performance after the first run.
That would be great, many thanks! The XLA could help (I am unsure at this point whether we tried) but we would need to run it without anyways as we may have undefined shapes).
I think after taking a look at the code to see if there are some easy ways to optimize, we can decide whether this should be a TF C++ kernel vs. in TFP (Adding a Kernel to TF may require more work since it's often preferred to have a C++ impl which means a TF and an XLA version. Adding to TFP would mean you get to check in python code and it would be available in JAX too!)
Yes, I agree, that was also the main reason to put a lot of effort into this: to make it available in JAX as well.
Re: Gradient: I believe the Faddeeva derivative can be implicitly defined in terms of the Faddeeva function so that should hopefully be pretty easy to add on top.
Yes, I think so too (we just have not done it yet explicitly)
Even if the implementation is not top performance, I think it still makes sense to add it now, and then later on tackle performance improvements.
Okay sure! We have two implementations, one that is less precise than the other, but simpler (and I guess faster). Let's see what @aman027 says to it
@jonas-eschle One of the two implementations has gradient? I think the gradient is pretty crucial (if I didn't want gradient I'd just call out scipy or something)
No, neither yet AFAIK (yes, that is a bit the point, but that should indeed not be to difficult)
Indeed the gradient is given by the same Faddeeva function:
Hopefully it's not difficult to add.
I want to work on this issue, but I got a bit confused because a part of it has already been implemented. I need a bit more info about where the gradient is to be added.
So the non-complex version has been implemented already and for the complex, we have a few implementations here: https://github.com/Aman027/GSOC-21-Report (they are all fine copyright-wise). There is also timing etc on the website. The original author Aman Verma wanted to actually contribute it and should be around in a few weeks, but currently we're missing indeed the gradient.
So what needs to be done is to use one of the implementations (probably the simplest one), create a gradient function for it and then we can contribute both parts together.
Do you want to implement the gradient and see if it works (say write some tests?)
Okay, I'm checking out the implementations. I'll try to implement the gradient to https://github.com/Aman027/GSOC-21-Report/blob/main/TensorFlow%20Implementations/Faddeeva_cpp_implementation.py, I guess that's the simplest one
Actually no, I would suggest o use the cuda one https://github.com/Aman027/GSOC-21-Report/blob/main/TensorFlow%20Implementations/Faddeeva_cuda_implementation.py
This one is slightly less precise but is more performant (see also the benchmarks in the readme)
(It's not in CUDA, it is just inspired by it)
Okay, sure. I'll work on the cuda one then.
Hello @jonas-eschle, is this issue still open? If yes, can I attempt to implement the gradient of any one of the Faddeeva implementations as mentioned above?
Hi @shantam-8 , we didn't had a lot of progress, so this is still open. However, we managed to implement a simplified version that has lower accuracy (order of 1e-8 instead of 1e-16, by @ikrommyd). If you want to implement the gradient on the above, more precise version, this would be welcome I think @aritropc , did you implement a gradient?