Reorganize distribution.py
The code in distribution.py has multiple functions for the same physical distribution. This could probably use an object oriented interface!
In fact, it could be a good idea to look at how scipy.stats does their probability distribution functions and basically subclass that interface. That, however, is worth a lot more of bonus points, so it's more of an optional thing for later. Or it could be that this saves a lot of planning the user interface, so it might be the easy way to do this. I'm not sure myself.
To be clear, we have:
- 1D Maxwellian speed distribution
- 1D Maxwellian velocity distribution
- 3D Maxwellian speed distribution
- 3D Maxwellian velocity distribution
and then the same for kappa distributions. How do you want to organize this?
One way would be to combine functions with the same dimension and then add a method flag to select between speed or velocity distribution. I think combining function of different dimensionality will be a bit trickier. Might need a tuple or array interface for the velocity arguments.
I think it'd make sense to have something like
class Distribution: # possibly an abstract base class
# define some basic methods like Sample1DSpeed, Sample1DVelocity, Sample3DSpeed, Sample3DVelocity
class MaxwellianDistribution(Distribution):
def __init__(...)
#redefine methods if necessary
class KappaDistribution(Distribution):
# same thing
I'm up for discussing the dimensional issue, there's probably a better solution in there.
Is there a way to use classes such that you don't have to go through the "constructor"? Otherwise this just seems like extra hoops for a user to jump through.
Function interfaces are nice because if you have a decent IDE then it'll pop-up and tell you what are the positional and optional arguments. It also means the arguments are explicit.
That being said, maybe we can make a function called maxwellian_distribution() that does something like this:
def maxwellian_distribution(temperature, velocities, drift_velocities, method="velocity"):
if method == "velocity":
if len(velocities) == 1:
_maxwellian_distribution_1D(temperature, velocity[0], drift_velocities[0])
elif len(velocities) == 3:
_maxwellian_distribution_3D(temperature, velocity[0], velocity[1], velocity[2], etc.)
else:
raise Exception(f"No method for {len(velocities)} dimensional distribution. Try 1D or 3D.")
elif method == "speed":
"Do same stuff as above"
else:
raise Exception(f"No such method: {method} found.")
So we just hide the functions that we currently have and put them under this neat interface.
I think from #452 it is clear that the distribution functions follow a pattern with respect to dimensionality. So if we can sort out this business with drift and the normalization constants on speed distributions I think we make a single class or function for accessing these distribution functions.
Is this issue still current? If so, would you like to redefine scope of work? As I see you did not came up to agreement :smile_cat:
Sure, any improvements here are welcome!
As for the scope of work, I guess that still has to be decided.
H! First and foremost, I'm pretty new to this library, but eager to contribute, so I apologize in advance for the mistakes I'm likely to commit. Hope this issue is still up for grabs, since it seems like a nice design problem to start.
Looking at the distribution.py, these are the possible signatures:
def Maxwellian_1D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"): # should include velocity in name
def Maxwellian_velocity_2D(vx, vy, T, particle="e", vx_drift=0, vy_drift=0, vTh=np.nan, units="units")
def Maxwellian_velocity_3D(vx, vy, vz, T, particle='e' vx_drift=0, vy_drift=0, vz_drift=0, vTh=np.nan, units="units")
def Maxwellian_speed_1D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"):
def Maxwellian_speed_2D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"):
def Maxwellian_speed_3D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"):
def kappa_velocity_1D(v, T, kappa, particle="e", v_drift=0, vTh=np.nan, units="units"):
def kappa_velocity_3D(vx, vy, vz, T, particle='e' vx_drift=0, vy_drift=0, vz_drift=0, vTh=np.nan, units="units")
To make the code a bit more friendly towards extending it (like adding 2D kappa velocity or 1D kappa speed), One way of doing so would be to refactoring by the Builder design principle such that creating a distribution that is defined thrgough dimension, type, and property:
dist = DistributionFactory.createDistribution(type='maxwellian', dimension=1, property='speed')
dist(v, T, particle="e", v_drift=0, vTh=np.nan, units="units")
The DistributionFactory is a class that would produce the appropriate distribution. This would allow for easier extension of different types of distributions, make code a bit more decoupled and less duplicated.
Looking at the methods, the code could be reused more by unifying conditioning on units, more specifically, these two below could be unified
if units == "units":
# unit checks and conversions
# checking velocity units
vx = vx.to(u.m / u.s)
vy = vy.to(u.m / u.s)
# catching case where drift velocities have default values, they
# need to be assigned units
vx_drift = _v_drift_units(vx_drift)
vy_drift = _v_drift_units(vy_drift)
# convert temperature to Kelvins
T = T.to(u.K, equivalencies=u.temperature_energy())
if np.isnan(vTh):
# get thermal velocity and thermal velocity squared
vTh = parameters.thermal_speed(T, particle=particle, method="most_probable")
elif not np.isnan(vTh):
# check units of thermal velocity
vTh = vTh.to(u.m / u.s)
elif np.isnan(vTh) and units == "unitless":
# assuming unitless temperature is in Kelvins
vTh = parameters.thermal_speed(
T * u.K, particle=particle, method="most_probable"
).si.value
if units == "units":
# unit checks and conversions
# checking velocity units
vx = vx.to(u.m / u.s)
vy = vy.to(u.m / u.s)
vz = vz.to(u.m / u.s)
# catching case where drift velocities have default values, they
# need to be assigned units
vx_drift = _v_drift_units(vx_drift)
vy_drift = _v_drift_units(vy_drift)
vz_drift = _v_drift_units(vz_drift)
# convert temperature to Kelvins
T = T.to(u.K, equivalencies=u.temperature_energy())
if np.isnan(vTh):
# get thermal velocity and thermal velocity squared
vTh = parameters.thermal_speed(T, particle=particle, method="most_probable")
elif not np.isnan(vTh):
# check units of thermal velocity
vTh = vTh.to(u.m / u.s)
elif np.isnan(vTh) and units == "unitless":
# assuming unitless temperature is in Kelvins
vTh = parameters.thermal_speed(
T * u.K, particle=particle, method="most_probable"
).si.value
Some choices depend whether you want to keep being backward compatible. This could be done with this solution, but would look something like (not pretty):
def DistributionFactory():
def __init__ (self, distribution_instance, *args, **kwargs):
if distribution_instance.__class__ == Maxwellian_1D:
dim = Dimension(1)
property = 'velocity'
distribution_type = 'Maxwellian'
elif:
....
dist = createDistribution(type=distribution_type, dimension=dim, property=property)
return dist(args, kwargs)
Please let me know your initial thoughts, I can create a WIR PR to make things a bit more concrete.
That looks great @FilipBolt! I'd love to see it as a PR :)
One thing about units... I think we had the right idea, but wrong implementation back then. The issue with units here is that all the math inside the function is done with quantities, which is always going to be resource intensive, while we're not really gaining anything inside these functions by using them. It would probably make sense to:
- get rid of the
unitsargument - enforce units at argument input (via validate_quantities ). This would be slightly API breaking, but in a good way.
- first grab the
.sivalues, do all the math on those - then cast the outputs to default
.siQuantities at output
Does that sound reasonable to you?
Thanks very much @StanczakDominik , I finally looked into this.
I believe I understand what you mean with getting rid of the units argument, and it seems sensible. In short, the idea is to decorate methods using something like @validate_quantities(T=u.K, validations_on_return=u.s / u.m) that will enforce inputs and outputs.
I do have some questions though:
-
What kind of validation would be appropriate for
v(vx, ...) ? I am not sure how to ensure a unit is convertible to m/s. I tried the code below to find an equivalence class using :from astropy import units as u; (u.m / u.s).find_equivalent_units(), but nothing was found. -
Can you think of a way to simply replace this test that verifies the norm and does integration on a range? The problem is that the
quadmethod calls for floats as integral bounds, whereas we're making our distribution work on units (and enforcing that input explicitly).
def test_norm(self):
"""
Tests whether distribution function is normalized, and integrates to 1.
"""
# converting vTh to unitless
vTh = self.vTh.si.value
# setting up integration from -10*vTh to 10*vTh, which is close to Inf
infApprox = 10 * vTh
# integrating, this should be close to 1
integ = spint.quad(
Maxwellian_1D,
-infApprox,
infApprox,
args=(self.T_e, self.particle, 0, vTh),
epsabs=1e0,
epsrel=1e0,
)
# value returned from quad is (integral, error), we just need
# the 1st
integVal = integ[0]
exceptStr = (
"Integral of distribution function should be 1 " f"and not {integVal}."
)
assert np.isclose(integVal, 1, rtol=1e-3, atol=0.0), exceptStr
Again, apologies if these are too rudimentary and obvious.
Hey, no worries! Checking that the input/output is convertible to m/s can definitely be handled by the validate quantities decorator I linked in the last post. I can post an example later if you'd like. Digging into the other part in a sec...
... and we're hitting incompatibilities between scipy and unit packages again... We could wrap our distribution function into a helper function that adds units to input and drops them from the output, then pass that helper to quad. It's not pretty nor elegant, but I think it'd work.
Great, thanks for the quick responses. I'll do the integration via a wrapper. And I believe I've found a way to validate v.
I've decided to do this in two PRs. The first one should enforce units (just posted in https://github.com/PlasmaPy/PlasmaPy/pull/816), and the second one should do the refactoring.
Please take a look when you have the time.
Hey, is this issue still open? I'm interested if that's the case
Yes, go right ahead! There is no active development on this at the moment. Please look through the discussions we had on this topic in PR #816 and let us know if you have any questions.
Hi, seems like this is a very old Issues, what is the status, can I still look at this?