PlasmaPy Reorganize distribution.py

The code in distribution.py has multiple functions for the same physical distribution. This could probably use an object oriented interface!

In fact, it could be a good idea to look at how scipy.stats does their probability distribution functions and basically subclass that interface. That, however, is worth a lot more of bonus points, so it's more of an optional thing for later. Or it could be that this saves a lot of planning the user interface, so it might be the easy way to do this. I'm not sure myself.

Apr 20 '18 19:04 StanczakDominik

To be clear, we have:

1D Maxwellian speed distribution
1D Maxwellian velocity distribution
3D Maxwellian speed distribution
3D Maxwellian velocity distribution

and then the same for kappa distributions. How do you want to organize this?

One way would be to combine functions with the same dimension and then add a method flag to select between speed or velocity distribution. I think combining function of different dimensionality will be a bit trickier. Might need a tuple or array interface for the velocity arguments.

Apr 20 '18 19:04 lemmatum

I think it'd make sense to have something like

class Distribution: # possibly an abstract base class
# define some basic methods like Sample1DSpeed, Sample1DVelocity, Sample3DSpeed, Sample3DVelocity

class MaxwellianDistribution(Distribution):
def __init__(...)
#redefine methods if necessary

class KappaDistribution(Distribution):
# same thing

I'm up for discussing the dimensional issue, there's probably a better solution in there.

Apr 20 '18 19:04 StanczakDominik

Is there a way to use classes such that you don't have to go through the "constructor"? Otherwise this just seems like extra hoops for a user to jump through.

Function interfaces are nice because if you have a decent IDE then it'll pop-up and tell you what are the positional and optional arguments. It also means the arguments are explicit.

That being said, maybe we can make a function called maxwellian_distribution() that does something like this:

def maxwellian_distribution(temperature, velocities, drift_velocities, method="velocity"):
    if method == "velocity":
        if len(velocities) == 1:
            _maxwellian_distribution_1D(temperature, velocity[0], drift_velocities[0])
        elif len(velocities) == 3:
            _maxwellian_distribution_3D(temperature, velocity[0], velocity[1], velocity[2], etc.)
        else:
            raise Exception(f"No method for {len(velocities)} dimensional distribution. Try 1D or 3D.")
    elif method == "speed":
            "Do same stuff as above"
    else:
            raise Exception(f"No such method: {method} found.")

So we just hide the functions that we currently have and put them under this neat interface.

Apr 20 '18 22:04 lemmatum

I think from #452 it is clear that the distribution functions follow a pattern with respect to dimensionality. So if we can sort out this business with drift and the normalization constants on speed distributions I think we make a single class or function for accessing these distribution functions.

May 11 '18 01:05 lemmatum

Is this issue still current? If so, would you like to redefine scope of work? As I see you did not came up to agreement :smile_cat:

Jan 17 '20 20:01 jakub-polak

Sure, any improvements here are welcome!

As for the scope of work, I guess that still has to be decided.

Jan 18 '20 09:01 StanczakDominik

H! First and foremost, I'm pretty new to this library, but eager to contribute, so I apologize in advance for the mistakes I'm likely to commit. Hope this issue is still up for grabs, since it seems like a nice design problem to start.

Looking at the distribution.py, these are the possible signatures:

def Maxwellian_1D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"): # should include velocity in name

def Maxwellian_velocity_2D(vx, vy, T, particle="e", vx_drift=0, vy_drift=0, vTh=np.nan, units="units")

def Maxwellian_velocity_3D(vx, vy, vz, T, particle='e' vx_drift=0, vy_drift=0, vz_drift=0, vTh=np.nan, units="units")

def Maxwellian_speed_1D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"):

def Maxwellian_speed_2D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"):

def Maxwellian_speed_3D(v, T, particle="e", v_drift=0, vTh=np.nan, units="units"):

def kappa_velocity_1D(v, T, kappa, particle="e", v_drift=0, vTh=np.nan, units="units"):

def kappa_velocity_3D(vx, vy, vz, T, particle='e' vx_drift=0, vy_drift=0, vz_drift=0, vTh=np.nan, units="units")

To make the code a bit more friendly towards extending it (like adding 2D kappa velocity or 1D kappa speed), One way of doing so would be to refactoring by the Builder design principle such that creating a distribution that is defined thrgough dimension, type, and property:

dist = DistributionFactory.createDistribution(type='maxwellian', dimension=1, property='speed')
dist(v, T, particle="e", v_drift=0, vTh=np.nan, units="units")

The DistributionFactory is a class that would produce the appropriate distribution. This would allow for easier extension of different types of distributions, make code a bit more decoupled and less duplicated.

Looking at the methods, the code could be reused more by unifying conditioning on units, more specifically, these two below could be unified

    if units == "units":
        # unit checks and conversions
        # checking velocity units
        vx = vx.to(u.m / u.s)
        vy = vy.to(u.m / u.s)
        # catching case where drift velocities have default values, they
        # need to be assigned units
        vx_drift = _v_drift_units(vx_drift)
        vy_drift = _v_drift_units(vy_drift)
        # convert temperature to Kelvins
        T = T.to(u.K, equivalencies=u.temperature_energy())
        if np.isnan(vTh):
            # get thermal velocity and thermal velocity squared
            vTh = parameters.thermal_speed(T, particle=particle, method="most_probable")
        elif not np.isnan(vTh):
            # check units of thermal velocity
            vTh = vTh.to(u.m / u.s)
    elif np.isnan(vTh) and units == "unitless":
        # assuming unitless temperature is in Kelvins
        vTh = parameters.thermal_speed(
            T * u.K, particle=particle, method="most_probable"
        ).si.value

    if units == "units":
        # unit checks and conversions
        # checking velocity units
        vx = vx.to(u.m / u.s)
        vy = vy.to(u.m / u.s)
        vz = vz.to(u.m / u.s)
        # catching case where drift velocities have default values, they
        # need to be assigned units
        vx_drift = _v_drift_units(vx_drift)
        vy_drift = _v_drift_units(vy_drift)
        vz_drift = _v_drift_units(vz_drift)
        # convert temperature to Kelvins
        T = T.to(u.K, equivalencies=u.temperature_energy())
        if np.isnan(vTh):
            # get thermal velocity and thermal velocity squared
            vTh = parameters.thermal_speed(T, particle=particle, method="most_probable")
        elif not np.isnan(vTh):
            # check units of thermal velocity
            vTh = vTh.to(u.m / u.s)
    elif np.isnan(vTh) and units == "unitless":
        # assuming unitless temperature is in Kelvins
        vTh = parameters.thermal_speed(
            T * u.K, particle=particle, method="most_probable"
        ).si.value

Some choices depend whether you want to keep being backward compatible. This could be done with this solution, but would look something like (not pretty):

def DistributionFactory():

def __init__ (self, distribution_instance, *args, **kwargs):
	if distribution_instance.__class__ == Maxwellian_1D:
            dim = Dimension(1)
            property = 'velocity'
	    distribution_type = 'Maxwellian'
	elif:
	     ....
       dist = createDistribution(type=distribution_type, dimension=dim, property=property)
	return dist(args, kwargs)

Please let me know your initial thoughts, I can create a WIR PR to make things a bit more concrete.

May 12 '20 09:05 FilipBolt

That looks great @FilipBolt! I'd love to see it as a PR :)

One thing about units... I think we had the right idea, but wrong implementation back then. The issue with units here is that all the math inside the function is done with quantities, which is always going to be resource intensive, while we're not really gaining anything inside these functions by using them. It would probably make sense to:

get rid of the units argument
enforce units at argument input (via validate_quantities ). This would be slightly API breaking, but in a good way.
first grab the .si values, do all the math on those
then cast the outputs to default .si Quantities at output

Does that sound reasonable to you?

May 12 '20 12:05 StanczakDominik

Thanks very much @StanczakDominik , I finally looked into this.

I believe I understand what you mean with getting rid of the units argument, and it seems sensible. In short, the idea is to decorate methods using something like @validate_quantities(T=u.K, validations_on_return=u.s / u.m) that will enforce inputs and outputs.

I do have some questions though:

What kind of validation would be appropriate for v (vx, ...) ? I am not sure how to ensure a unit is convertible to m/s. I tried the code below to find an equivalence class using : from astropy import units as u; (u.m / u.s).find_equivalent_units(), but nothing was found.
Can you think of a way to simply replace this test that verifies the norm and does integration on a range? The problem is that the quad method calls for floats as integral bounds, whereas we're making our distribution work on units (and enforcing that input explicitly).

def test_norm(self):
    """
    Tests whether distribution function is normalized, and integrates to 1.
    """
    # converting vTh to unitless
    vTh = self.vTh.si.value
    # setting up integration from -10*vTh to 10*vTh, which is close to Inf
    infApprox = 10 * vTh
    # integrating, this should be close to 1
    integ = spint.quad(
        Maxwellian_1D,
        -infApprox,
        infApprox,
        args=(self.T_e, self.particle, 0, vTh),
        epsabs=1e0,
        epsrel=1e0,
    )
    # value returned from quad is (integral, error), we just need
    # the 1st
    integVal = integ[0]
    exceptStr = (
        "Integral of distribution function should be 1 " f"and not {integVal}."
    )
    assert np.isclose(integVal, 1, rtol=1e-3, atol=0.0), exceptStr

Again, apologies if these are too rudimentary and obvious.

May 25 '20 18:05 FilipBolt

Hey, no worries! Checking that the input/output is convertible to m/s can definitely be handled by the validate quantities decorator I linked in the last post. I can post an example later if you'd like. Digging into the other part in a sec...

May 25 '20 18:05 StanczakDominik

... and we're hitting incompatibilities between scipy and unit packages again... We could wrap our distribution function into a helper function that adds units to input and drops them from the output, then pass that helper to quad. It's not pretty nor elegant, but I think it'd work.

May 25 '20 18:05 StanczakDominik

Great, thanks for the quick responses. I'll do the integration via a wrapper. And I believe I've found a way to validate v.

May 25 '20 19:05 FilipBolt

I've decided to do this in two PRs. The first one should enforce units (just posted in https://github.com/PlasmaPy/PlasmaPy/pull/816), and the second one should do the refactoring.

Please take a look when you have the time.

May 26 '20 09:05 FilipBolt

Hey, is this issue still open? I'm interested if that's the case

Jan 25 '22 22:01 kitsiosvas

Yes, go right ahead! There is no active development on this at the moment. Please look through the discussions we had on this topic in PR #816 and let us know if you have any questions.

Jan 26 '22 20:01 rocco8773

Hi, seems like this is a very old Issues, what is the status, can I still look at this?

Sep 28 '24 03:09 harsha-mamidi