sbpy icon indicating copy to clipboard operation
sbpy copied to clipboard

Combining DataClass objects

Open mkelley opened this issue 4 years ago • 12 comments

This is a request for

  • [x] a new feature
  • [ ] an enhancement to existing sbpy functionality
  • [ ] somethings else: [explain here]

The requested changes will be implemented by

  • [ ] me
  • [x] the sbpy developers

High-level concept Provide a mechanism to combine DataClass objects that have different fields.

Explain the relevance to sbpy DataClass objects have flexibility beyond what QTable is designed for.

Proposal details Something like the following behavior would be useful:

from astropy.time import Time
from sbpy.data import Orbit

orbits = Orbit.from_horizons(
        '46P', 'designation', epochs=(obs_dates['spitzer'].jd.mean(), obs_dates['bass']),
        closest_apparition=True)
for planet in (399, 599):
    orbits.add_row(Orbit.from_horizons(planet, 'majorbody', epochs=obs_dates['bass'])[0])

However, DataClass.add_row does not exist. DataClass.table.add_row does exist, but:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-02e3f7890044> in <module>()
      3         closest_apparition=True)
      4 for planet in (399, 599):
----> 5     orbits.table.add_row(Orbit.from_horizons(planet, 'majorbody', epochs=obs_dates['bass'])[0])

/home/msk/local/lib/python3.6/site-packages/astropy-3.2.1-py3.6-linux-x86_64.egg/astropy/table/table.py in add_row(self, vals, mask)
   2399              3   6   9
   2400         """
-> 2401         self.insert_row(len(self), vals, mask)
   2402 
   2403     def insert_row(self, index, vals=None, mask=None):

/home/msk/local/lib/python3.6/site-packages/astropy-3.2.1-py3.6-linux-x86_64.egg/astropy/table/table.py in insert_row(self, index, vals, mask)
   2488 
   2489             if len(self.columns) != len(vals):
-> 2490                 raise ValueError('Mismatch between number of vals and columns')
   2491 
   2492             if mask is not None:

ValueError: Mismatch between number of vals and columns

mkelley avatar Jul 23 '19 15:07 mkelley

This is an interesting and non-trivial problem. Orbits for different objects use different parameters:

from sbpy.data import Orbit

orbits = Orbit.from_horizons(
    '46P', 'designation',
    closest_apparition=True)
print(orbits.field_names)

for planet in (399, 599):
    neworbs = Orbit.from_horizons(planet, 'majorbody')
    print(neworbs.field_names)

<TableColumns names=('targetname','M1','e','k1','q','incl','Omega','w','Tp_jd','n','M','nu','a','Q','P','epoch')> <TableColumns names=('targetname','e','q','incl','Omega','w','Tp_jd','n','M','nu','a','Q','P','epoch')> <TableColumns names=('targetname','e','q','incl','Omega','w','Tp_jd','n','M','nu','a','Q','P','epoch')>

I have to think about how to solve this. Initially, I was reluctant to provide DataClass.add_row as it basically provides the same functionality as Table.from_row, but this might be a good case supporting it.

mommermi avatar Jul 31 '19 16:07 mommermi

note to myself: I tried to get around this issue by joining the two tables, but the problem is that both Time and Quantity cannot be masked, which is necessary for performing an outer join:

from astropy.time import Time
from sbpy.data import Orbit
from astropy.table import join

orbits = Orbit.from_horizons(
    '46P', 'designation',
    closest_apparition=True)
orbits['epoch'] = orbits['epoch'].utc.jd

for planet in (399, 599):
    neworbs = Orbit.from_horizons(planet, 'majorbody')
    neworbs['epoch'] = neworbs['epoch'].utc.jd
    print(join(orbits.table, neworbs.table, join_type='outer'))

NotImplementedError: join requires masking column 'M1' but column type Quantity does not support masking

There is no quick fix that could be included in v0.2....

mommermi avatar Aug 01 '19 21:08 mommermi

That makes this related to Issue #89 .

mkelley avatar Aug 01 '19 21:08 mkelley

@mkelley I'm just randomly browsing through the old issues. Masking Quantity type in astropy tables doesn't seem to be a problem by now. The example code snippet that @mommermi showed above works without raising an error with astropy v5.0.4 which is what I'm using. Should we go ahead and implement this feature? I can take a crack at it.

jianyangli avatar Jul 16 '22 00:07 jianyangli

Yes!

mkelley avatar Jul 21 '22 16:07 mkelley

I think a .append method that takes either Row, Table, or DataClass as input would work for what we need. Using Michael's example above, something like this:

from sbpy.data import Orbit

orbits = Orbit.from_horizons('46P', 'designation', closest_apparition=True)

# add more objects
for planet in (399, 599):
    neworbs = Orbit.from_horizons(planet, 'majorbody')
    orbits.append(neworbs)

# add a row
others = Orbit.from_horizons(['67P', '1P'], 'designation', closest_apparition=True)
orbits.append(others.table)  # append a table
orbits.append(others.table[0])  # append a row

Does this look good to you? We can either make the append inplace or return an appended object. I don't see a strong reason for one over another. Which do you prefer?

jianyangli avatar Jul 24 '22 16:07 jianyangli

Would it be better to follow astropy Table as an example and have methods add_row(), add_column(), add_columns(), and join()? Or, at least, a .append() method could be a convenience function that calls add_row, etc., as appropriate. I think in-place is better.

mkelley avatar Jul 29 '22 12:07 mkelley

Well, adding those methods would make it conceptually clear. But in the core, both add_row() and join() will need to use Table.vstack(join_type='outer') to accommodate different sets of columns (I think this is the easiest way to implement). Also, we already have __setitem__ and apply() that can be used to add a column so we don't have to worry about that functionality. That's why I prefer just one append() that takes a flexible input to catch all cases in the DataClass level, corresponding to apply() that expands a table in the orthogonal direction, and leave the low-level functionality to DataClass.table.

jianyangli avatar Jul 29 '22 14:07 jianyangli

Also, I'd like to update the documentation to advocate for not to use the exposed table property to change the table, but rather use the mechanisms realized by DataClass itself, such as __setitem__, apply to change columns, and potentially append or whatever we decide to change rows and join tables. What do you think?

jianyangli avatar Aug 01 '22 01:08 jianyangli

Well, Table.add_row is different from a join, because it does not add new columns: https://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table.add_row

Regardless, I prefer not to develop a new API for common table operations, given that the astronomy community is already working with several options: astropy tables, FITS tables, numpy record arrays, pandas data frames. And of those options, I think we should be most aligned with astropy tables. In principle, the more we can reuse astropy table functionality, the less code we need to write and test ourselves. We just need to address our column aliasing, and, as you point out, leave the low-level functionality to DataClass.table. That said, I'm definitely happy to build on that API with a flexible append() function. I expect that the code you would have put into an if statement, will just branch to the separate methods rather than be listed in-line, right?

All this make we want to think about sub-classing Table as an alternative approach. Not until after my vacation though!

mkelley avatar Aug 02 '22 00:08 mkelley

I see, so you want to reuse the API of astropy tables, expanding their functionalities to meet the requirement for DataClass but not introducing new method names. OK that does make sense to me now. We can realize an add_row to add a row with new columns. astropy table uses vstack function to join two tables. I think you agreed that we can add a class method join to join two tables. In this case, I actually don't think we should provide append, because effectively it's going to mud the water.

Previously I was thinking of just an append method because of the way that I had in mind to realize add_row and join:

from astropy.table import Table, vstack

class DataClass():

    def add_row(self, rows, names=names):
        tbl = Table(rows=rows, names=names)
        self.table = vstack((self.table, tbl))

    def join(self, data):
        if isinstance(data, DataClass):
            data = data.table
        elif isinstance(data, Table):
            pass
        else:
            raise ValueError()
        self.table = vstack((self.table, data))

So you see, both methods share similar logic, taking the input and making them a table, and then using astropy.table.vstack to join. But you did convince me that the reason related to implementation is not the reason for adding new API.

Regarding your idea of subclassing Table, I'm not sure it's a good idea. DataClass adds some strong constraints to the data stored in the table that we need to maintain, such as the integrity against Conf, and the flexibility of using alternative names, adding more rows and columns with one-to-multiple correspondence. If subclassing Table, we need to reload pretty much all the API functions, even if sometimes just very simple call to the superseded method with some simple check. That's going to be a lot of work.

Anyway, just think along the line. If you agree with my psudo implementation of add_row and join, I can code them up. I don't see any urgency at all. Enjoy your vacation first.

jianyangli avatar Aug 02 '22 17:08 jianyangli

@mkelley I just did a PR that implements the functionality as we discussed. I tried to make use of the existing functionalities from astropy Table and DataClass as much as possible. Based on my test, it has all the mechanics that we may need. But I'm not confident that join, which uses astropy.table.vstack is the best approach. Another function we may use is astropy.table.join, but I'm not familiar with it and don't fully understand it with a quick read of the documentation. So feel free to suggest if you are familiar with that and feel that's what we should use. Thanks.

jianyangli avatar Sep 21 '22 03:09 jianyangli