MulensModel
MulensModel copied to clipboard
copy of MulensData instance
I'm writing code for cleaning the data. It involves masking and un-masking points as bad. I've found it quite complicated - one cannot do
dataset.bad[i] = True
because it doesn't change dataset.good
. One has to do something like:
bad = dataset.bad.copy()
bad[i] = True
dataset.bad = bad
We can either change implementation of .bad
and .good
(I don't know how - @ketozhang do you know how?) or we can add a function that makes a copy of a MulensData
object. @jenniferyee What do you think?
I agree that this is a problem.
Currently "bad" and "good" are lists or np.array or something similar. There must be a way to make a class that behaves the way it needs to. For example, a "BadData" class (or maybe the class should be "DataQuality"?) whose properties "bad" and "good" are accessed by MulensData.bad and MulensData.good . Then it would have some kind of indexing method that inherits from list?
.bad
and .good
are two np.arrays of bool values and we want one to be logical not
of the other all the time. I'm guessing it would be possible to define a new class that inherits from np.array and whenever a value of one of these arrays is changed, then the other is recalculated. However, to work fully it will require defining all operators that make sense for np.array, which would be lots of work. Maybe we should add notes in docstrings (so that users don't do data.bad[i] = True
) and add the function that makes a copy of the MulensData class
I don't see a reason to keep a separate arrays for MulensData::_good
and MulensData::_bad
if they're opposite of each other. Can you do something like this?
class MulensData:
@property
def good(self):
return self._good
@property
def bad(self):
return ~self._good
@good.setter
def good(self, value):
self._good = value
@bad.setter
def bad(self, value):
self._good = not value
We had something like that before. But it turned out that np.logical_not()
is extremely slow (at least on some computers). It was dominating the run time in case of the fitting tests we performed. Maybe it would be best to return to the previous version and use self._good
as @ketozhang suggested. As far as I remember, we used self._bad
previously. I'm guessing self._bad
is very rarely used - maybe only in some plotting functions.
Will have to take a look how it was used: shape of array and how often MulensData::bad
was called.
Instead of @property
and building your own cache, you could use functools caching to handle this. See this SO question
I've checked that the only place where MulensData.bad
is used is plotting the dataset. There are a few calls in that function, but of course we could change that to a single call. Now I agree with Keto's comment 3 posts back.
The solution I propose won't solve the problem where we want to allow the users to mutate MulensModel::bad[i] = value
To me, I think it's trivial to get the inverse (i.e., ~MulensModel::good
) and to remove the property MulensModel::bad
altogether (or the reverse since I more often see bad flags stored than good ones) whenever the user needs it.
MulensData.copy() added in commit 2aff631. Issue not yet fully resolved.
Added a use case in dd54d4e2632e65bc7ab1c31bfe82be4a671b8137
I've checked that the solution from Keto's first reply in this thread is not solving calls like:
data.bad[12] = True
or
data.bad[np.where(data.mag < 13.)] = True
Though the same ones for data.good
do work. The above ones don't work, but there is no error and no warning!
That's right, the first solution I proposed will not let you mutate bad
.
I'm still inclined to suggest not providing your user good
. If they want only goods, many familiar with numpy know how to filter out things from an array:
good = data[~data.bad]
FYI, giving the user access to only the bad is the same behavior implemented in numpy's masked array. https://numpy.org/doc/stable/reference/maskedarray.generic.html#accessing-only-the-valid-entries