MulensModel copy of MulensData instance

I'm writing code for cleaning the data. It involves masking and un-masking points as bad. I've found it quite complicated - one cannot do

dataset.bad[i] = True

because it doesn't change dataset.good. One has to do something like:

bad = dataset.bad.copy()
bad[i] = True
dataset.bad = bad

We can either change implementation of .bad and .good (I don't know how - @ketozhang do you know how?) or we can add a function that makes a copy of a MulensData object. @jenniferyee What do you think?

Apr 19 '22 11:04 rpoleski

I agree that this is a problem.

Currently "bad" and "good" are lists or np.array or something similar. There must be a way to make a class that behaves the way it needs to. For example, a "BadData" class (or maybe the class should be "DataQuality"?) whose properties "bad" and "good" are accessed by MulensData.bad and MulensData.good . Then it would have some kind of indexing method that inherits from list?

Apr 19 '22 12:04 jenniferyee

.bad and .good are two np.arrays of bool values and we want one to be logical not of the other all the time. I'm guessing it would be possible to define a new class that inherits from np.array and whenever a value of one of these arrays is changed, then the other is recalculated. However, to work fully it will require defining all operators that make sense for np.array, which would be lots of work. Maybe we should add notes in docstrings (so that users don't do data.bad[i] = True) and add the function that makes a copy of the MulensData class

Apr 19 '22 20:04 rpoleski

I don't see a reason to keep a separate arrays for MulensData::_good and MulensData::_bad if they're opposite of each other. Can you do something like this?

class MulensData:

  @property
  def good(self):
    return self._good
  
  @property
  def bad(self):
    return ~self._good
  
  @good.setter
  def good(self, value):
    self._good = value
  
  @bad.setter
  def bad(self, value):
    self._good = not value

Apr 19 '22 20:04 ketozhang

We had something like that before. But it turned out that np.logical_not() is extremely slow (at least on some computers). It was dominating the run time in case of the fitting tests we performed. Maybe it would be best to return to the previous version and use self._good as @ketozhang suggested. As far as I remember, we used self._bad previously. I'm guessing self._bad is very rarely used - maybe only in some plotting functions.

Apr 20 '22 13:04 rpoleski

Will have to take a look how it was used: shape of array and how often MulensData::bad was called.

Instead of @property and building your own cache, you could use functools caching to handle this. See this SO question

Apr 20 '22 18:04 ketozhang

I've checked that the only place where MulensData.bad is used is plotting the dataset. There are a few calls in that function, but of course we could change that to a single call. Now I agree with Keto's comment 3 posts back.

Apr 22 '22 11:04 rpoleski

The solution I propose won't solve the problem where we want to allow the users to mutate MulensModel::bad[i] = value

To me, I think it's trivial to get the inverse (i.e., ~MulensModel::good) and to remove the property MulensModel::bad altogether (or the reverse since I more often see bad flags stored than good ones) whenever the user needs it.

Apr 25 '22 16:04 ketozhang

MulensData.copy() added in commit 2aff631. Issue not yet fully resolved.

Oct 09 '22 02:10 rpoleski

Added a use case in dd54d4e2632e65bc7ab1c31bfe82be4a671b8137

Oct 05 '23 20:10 jenniferyee

I've checked that the solution from Keto's first reply in this thread is not solving calls like:

data.bad[12] = True

or

data.bad[np.where(data.mag < 13.)] = True

Though the same ones for data.good do work. The above ones don't work, but there is no error and no warning!

Oct 06 '23 21:10 rpoleski

That's right, the first solution I proposed will not let you mutate bad.

I'm still inclined to suggest not providing your user good. If they want only goods, many familiar with numpy know how to filter out things from an array:

good = data[~data.bad]

Oct 06 '23 21:10 ketozhang

FYI, giving the user access to only the bad is the same behavior implemented in numpy's masked array. https://numpy.org/doc/stable/reference/maskedarray.generic.html#accessing-only-the-valid-entries

Oct 06 '23 23:10 ketozhang

MulensModel MulensModel copied to clipboard

copy of MulensData instance

MulensModel
MulensModel copied to clipboard