pyvo icon indicating copy to clipboard operation
pyvo copied to clipboard

Feature MIVOT (Model Instance in VOTable)

Open somilia opened this issue 1 year ago • 36 comments

Processing VO Model Annotations

Introduction

Model Instances in VOTables (MIVOT) defines a syntax for mapping VOTable data to any model serialized in VO-DML (Virtual Observatory Data Modeling Language).

This annotation schema acts as a bridge between data and models. It associates both column/parameter metadata and VOTable data with data model elements (such as classes, attributes, types, etc.). It also complements VOTable data or metadata that may be missing from the table, e.g., coordinate system descriptions or curation tracing.

The data model elements are organized in an independent annotation block complying with the MIVOT XML schema, which is added as an extra resource above the TABLE element. The MIVOT syntax allows a data structure to be described as a hierarchy of classes. It can also represent relationships and compositions between them. Furthermore, it can construct data model objects by aggregating instances from different tables of the VOTable.

Usage

The API allows you to obtain different levels of model views on the last read data row. These levels are described below. The lowest levels are model agnostic.

They provide tools to browse model instances dynamically generated. The understanding of the model elements is the responsibility of the final user.

The highest level (4) is based on the MANGO draft model and especially to its. It has been designed to solve the EpochPropagation use case risen at 2023 South Spring Interop.

The end user API allows to obtain a model view on the last read data row, this usage corresponds to the level 3 and 4 described below.

activate_features('MIVOT')
data_path = os.path.dirname(os.path.realpath(__file__))
votable = os.path.join(data_path, "tests/data/simple-annotation-votable.xml")

m_viewer = ModelViewerLevel1(votable)
row_view = m_viewer.get_next_row_view()

print(row_view.longitude.value)

You can access each value of the object of the model. e.g.: >>> print(row_view.Coordinate_coosys.PhysicalCoordSys_frame.spaceRefFrame.value) >>> ICRS

The model view is a dynamically generated Python object whose field names are derived from the dmroles of the MIVOT elements. There is no checking against the model structure at this level.

In this pull request, JOIN and dynamic references are not implemented. We keep focused on simpler patterns: The MIVOT block maps one mapped data table with the coordinate systems in the GLOBALS

PyVO Implementation

The implementation relies on the Astropy's write and read annotation modules (PR#15390) available from astropy 6.0, which allows to get and set Mivot blocks from/into VOTables. We use this new Astropy feature, MIVOT, to retrieve the MIVOT block. This implementation is built in 3 levels, denoting the abstraction level in relation to the XML block.

Level 1: ModelViewerLevel1

Provide the MIVOT block as it is in the VOTable: No references are resolved. The Mivot block is provided as an xml tree.

Level 2: ModelViewerLevel2

Provide access to an xml tree whose structure matches the model view of the current row. The internal references have been resolved (by _get_model_view() function of the ModelViewerLevel1). The attribute values have been set with the actual data values. This XML element is intended to be used as a basis for building any objects. The level2 output can be browsed using XPATH queries allowing users to retrieve MIVOT elements by their @dmrole or @dmtype. At this level, the MIVOT block must still be handled as an xml element.

Level 3: ModelViewerLevel3

ModelViewerLevel3 generates, from the level 1 output, a nested dictionary representing the entire XML INSTANCE with its hierarchy.

Level 4: MivotClass

From this dictionary, we build a ~pyvo.mivot.viewer.mivot_class.MivotClass object, which is a dictionary containing only the essential information used to process data. MivotClass basically stores all XML objects in its attribute dictionary __dict__.

Level stacking

The more levels we have , the more elements are hidden:

  • Hide column parsing: column identifiers in row view are replaced by model object. (e.g. get_attribute_by_role(coords:LonLatPoint.longitude))
  • Hide MIVOT identifiers: MIVOT XML identifiers (INSTANCE, COLLECTION, ATTRIBUTE) are replaced by model elements. (e.g. EpochPosition.longitude.value)
  • Hide model elements: model elements are replaced by library instances. (e.g. EpochPosition.get_sky_coord())
Level Column parsing MIVOT Identifiers Models elements
Level 1 & 2 Hidden Showed Showed
Level 3 Hidden Hidden Showed
Level 4 Hidden Hidden Hidden

Mivot package content:

Utils sub-package

This package contains modules that make it easier to handle XML elements and dictionaries, as well as the logger setup.

Seeker sub-package

  • AnnotationSeeker provides a set of tools for extracting mapping sub-blocks to retrieve XML elements.
  • RessourceSeeker provides a set of getters for tables.
  • TableIterator is a simple wrapper that iterates over table rows.

Feature sub-package

This package contains features such as StaticReferenceResolver which is used to resolve references (replace REFERENCE elements with the referenced objects set with the roles of the REFERENCEs). Future features to be added to this sub-package include DynamicReference and Join.

Exception sub-package

This package contains exception classes related to MIVOT.

Viewer sub-package

This package contains all the levels of ModelViewer described above, as well as the MivotClass file.

Test strategy:

The test module contains one pytest file per Python module, and the datasets used for the tests are located in the 'test/data' directory. The tests check for values, intermediate objects (dictionaries), errors, and thrown exceptions.

EDIT: Few changes have been made lastly:

  • ModelViewerLayer have been renamed by ModelViewerLevel.
  • lxml dependency have been removed, and replaced by defusedxml if available or by native xml package.
  • The get_next_row_view() function now allows to pass to the next row, returning the MivotClass with the new data row.
  • MivotClass has been added to the ModelViewerLevel1, in order to keep the same instance with only changing values that have a reference.
  • EpochPropagation have been improved, it can give an ~astropy.coordinates.sky_coordinate.SkyCoord containing only the data available in the row.

somilia avatar Nov 03 '23 15:11 somilia

Thank you for the review of this big code chunk.

  • This message answers your global comment.
  • The others will be applied to our code with the next push.

We're well aware that we're arriving with a code package that far exceeds the usual PR size. This is why, and after discussion with @tomdonaldson, we are starting with a draft PR in order to run a smooth integration within PyVO. Most of our code is used behind the scene. The user shouldn't care about it. The question is to know where to locate and how to organize all of that logic. This has been written with my own coding style (one module per class or so , modules grouped in folders by feature). This is may be not appropriate for PyVO, thus we would enjoy any advice adapting or improving our architecture.

lmichel avatar Nov 09 '23 16:11 lmichel

Most of our code is used behind the scene. The user shouldn't care about it.

Then add a _ in front of the method name to indicate they are private. It will open up the possibility of easier refactor in the future without going though a deprecation cycle.

bsipocz avatar Nov 09 '23 20:11 bsipocz

Codecov Report

Attention: Patch coverage is 86.91489% with 123 lines in your changes are missing coverage. Please review.

Project coverage is 81.24%. Comparing base (ded4360) to head (befc7bc). Report is 14 commits behind head on main.

:exclamation: Current head befc7bc differs from pull request most recent head 57a3885. Consider uploading reports for the commit 57a3885 to get more accurate results

Files Patch % Lines
pyvo/mivot/utils/xml_utils.py 57.14% 27 Missing :warning:
pyvo/mivot/viewer/mivot_viewer.py 89.12% 26 Missing :warning:
pyvo/mivot/seekers/annotation_seeker.py 94.96% 8 Missing :warning:
pyvo/mivot/viewer/mivot_instance.py 89.61% 8 Missing :warning:
pyvo/mivot/seekers/resource_seeker.py 86.79% 7 Missing :warning:
pyvo/mivot/utils/json_encoder.py 36.36% 7 Missing :warning:
pyvo/mivot/utils/dict_utils.py 73.91% 6 Missing :warning:
pyvo/mivot/utils/logger_setup.py 81.81% 6 Missing :warning:
pyvo/mivot/utils/vocabulary.py 90.76% 6 Missing :warning:
pyvo/mivot/viewer/xml_viewer.py 89.28% 6 Missing :warning:
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #497      +/-   ##
==========================================
+ Coverage   80.38%   81.24%   +0.86%     
==========================================
  Files          52       69      +17     
  Lines        6189     7129     +940     
==========================================
+ Hits         4975     5792     +817     
- Misses       1214     1337     +123     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 15 '23 10:11 codecov[bot]

After some testing achieved with Vizier people by using the service they have deployed, it turns out that the a few tweaks must be fixed :

  • Better connection of the model viewers with the DALResults object.
  • few more unit tests

In addition a documentation page must be written with code examples. This will be pushed on the source branch with the next commit.

lmichel avatar Jan 31 '24 14:01 lmichel

After some testing achieved with Vizier people by using the service they have deployed, it turns out that the a few tweaks must be fixed :

* Better connection of the model viewers with the DALResults object.

* few more unit tests

In addition a documentation page must be written with code examples. This will be pushed on the source branch with the next commit.

Done (fork CI passed)

lmichel avatar Feb 05 '24 11:02 lmichel

@astropy/coordinators - please do add the contributors from this PR to the org, so CI will run here. And it would be nice to have a better, more automated way to do this.

bsipocz avatar Feb 06 '24 19:02 bsipocz

Sure, but there are a lot of conversations to comb through here, can you please give me the usernames?

Also, such requests can also be made via https://github.com/astropy/astropy-project/blob/main/.github/ISSUE_TEMPLATE/github-admin.yaml in the future. Thanks!

pllim avatar Feb 06 '24 19:02 pllim

p.s. We tried to automate but it's broken. Maybe @mwcraig would have time to fix it at some point.

  • https://github.com/astropy/astropy-tools/issues/178

pllim avatar Feb 06 '24 19:02 pllim

Sure, but there are a lot of conversations to comb through here, can you please give me the usernames?

Aren't they in the commits list? (and sorry, I don't know where/how to add them to the org myself)

dhomeier avatar Feb 06 '24 19:02 dhomeier

There are 103 commits. I cannot quickly figure out without combing through them.

pllim avatar Feb 06 '24 19:02 pllim

I think @somilia is the only one that needs to be added -- it is the fact that the person who opened the PR isn't part of the org that causes the issue IIRC.

mwcraig avatar Feb 06 '24 19:02 mwcraig

Looks like @lmichel is the only other committer...

mwcraig avatar Feb 06 '24 19:02 mwcraig

I only saw 2 (@somilia and @lmichel), but if we need co-authors, yes, parsing the messages is out of bounds.

dhomeier avatar Feb 06 '24 19:02 dhomeier

Thanks! I added both to astropy-contributors.

pllim avatar Feb 06 '24 19:02 pllim

Hello,

Thank you for adding us. I still have a "3 workflows awaiting approval " on #497

Regards LM

Le 06/02/2024 à 20:15, Brigitta Sipőcz a écrit :

@astropy/coordinators https://github.com/orgs/astropy/teams/coordinators - please do add the contributors from this PR to the org, so CI will run here. And it would be nice to have a better, more automated way to do this.

— Reply to this email directly, view it on GitHub https://github.com/astropy/pyvo/pull/497#issuecomment-1930595725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXOP6HSAF2A4PYSLT3VPHDYSJ6NXAVCNFSM6AAAAAA64SVL2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZQGU4TKNZSGU. You are receiving this because you commented.Message ID: @.***>

-- English version: https: //www.deepl.com/translator

jesuischarlie/Tunis/Paris/Bruxelles/Berlin

Laurent Michel SSC XMM-Newton Tél : +33 (0)3 68 85 24 37 Fax : +33 (0)3 )3 68 85 24 32 Université de Strasbourg http://www.unistra.fr Observatoire Astronomique 11 Rue de l'Université F - 67200 Strasbourg

lmichel avatar Feb 07 '24 10:02 lmichel

Ah, those approvals. That is controlled by GitHub, not the org. As a rule to deter spammers, GitHub says workflows do not automatically run unless you have a merged commit in the repo already.

https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks

By default, all first-time contributors require approval to run workflows.

pllim avatar Feb 07 '24 14:02 pllim

To make things easier for everyone (e.g., not having to wait around for such approvals in this PR), you can run the test suite locally instead if you are unsure. And hopefully when this gets merged, your next PR will no longer require such approvals. Good luck!

p.s. +6,214 −4

pllim avatar Feb 07 '24 14:02 pllim

By default, all first-time contributors require approval to run workflows.

For which the workaround suggested by GH was to add the contributor to the org. Maybe they changed policies since that discussion a while back.

ps: yes, the size is one of the reasons why it takes this long to get the reviews/etc going.

bsipocz avatar Feb 08 '24 18:02 bsipocz

Le 8 févr. 2024 à 19:05, Brigitta Sipőcz @.***> a écrit :

By default, all first-time contributors require approval to run workflows.

For which the workaround suggested by GH was to add the contributor to the org. Maybe they changed policies since that discussion a while back.Anyway, no commits anymore until the review. ps: yes, the size is one of the reasons why it takes this long to get the reviews/etc goingI’m aware about the size but if I split the PR, I’ll do a second one just after: no gain thus.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

lmichel avatar Feb 08 '24 18:02 lmichel

Well, I did add lmichel to astropy-contributors but didn't seem to help. I dunno if the addition has to be done before this PR was opened or the workaround assumption was wrong. 🤷

I guess another workaround is for lmichel to do a very trivial fix somewhere else in this repo and get that PR merged first. (Unproven.)

pllim avatar Feb 08 '24 18:02 pllim

I guess another workaround is for lmichel to do a very trivial fix somewhere else in this repo and get that PR merged first. (Unproven.)

That also worked before. But as Matt said above, it might need to be Somia to do these as they opened the PR (though I would be surprised if one of us maintainers to push to it at that commit wouldn't trigger the CI).

Anyway, at this point I would say wait for the review, and then we try to wrap it up as smoothly as possible. No point in trying to split it up into multiple PRs at this point, etc. And a rebase and some targeted squashing will clean up the history significantly, too, but I would wait with that after the content review, too.

bsipocz avatar Feb 08 '24 19:02 bsipocz

FWIW I also added somilia to the same astropy-contributors team at the same time as lmichel . Anyways, sorry I couldn't help. If you need to rerun CI , feel free to ping me. I am usually responsive when I am working (USA New York time).

pllim avatar Feb 08 '24 19:02 pllim

They both contributed in the big feature PR to core in https://github.com/astropy/astropy/pull/15390, so maybe some policy to automatically add people to the org who contribute significantly would have prevented this issue.

(note: I do not advocate for inviting all who adds a single character doc fix to the org, but here we are talking about a significant enhancement (to core astropy) that has a follow-up (this PR) that runs into technical gates as people are automatically put into newcomer bins while there are not at all newcomers to the org).

bsipocz avatar Feb 08 '24 20:02 bsipocz

automatically add people to the org who contribute significantly

I agree. The automation is broken. See https://github.com/astropy/astropy-tools/issues/178 😢

pllim avatar Feb 08 '24 20:02 pllim

The things are a bit more complicated since the @somilia contract ends up next week. I don't think she will continue to contribute to the project anymore. Even If I make a fake little PR to get the grant for running the CI, I couldn't do it for #497 which has been open by @somilia isn't it?

lmichel avatar Feb 08 '24 21:02 lmichel

Thank for this library - I made some tests for VizieR and I have some suggestions to improve the API usability.

  • hide the level model . I suggest to have only one level in the API with a name like MivotViewer for instance (see example below)
  • extend ModelViewerLevel1 to accept VOTableFile and Resource in the constructor
  • documentation could help ! I could make a vizier example based on a mango position propagation

Example: pseudo-code example (easy to propose, more difficult to implement :) )

from astropy.io.votable import parse
vot = parse("t.vot")
resource = vot.resources[0]
mivot_viewer = MivotViewer(resource)

# print the mivot instance
print(f"Mib=vot instance {mivot_viewer.mivot_class.dm_type}")

for rec in resource.tables[0]:
    mivot_viewer.update(rec)
    # print ra,dec using the mivot object
    print(f"{mivot_viewer.mivot_class.longitude.value},{mivot_viewer.mivot_class.longitude.value})

gilleslandais avatar Feb 20 '24 17:02 gilleslandais

The @gilleslandais post suggests a leak of a more user friendly API. I've been looking at the impact on the current code to implement it. It is actually limited (just rework the mivot_class module) with the condition of keeping the table row iterator inside the MivotViewer to ensure the last row read is consistent with the MIVOT annotations. I suggest implementing this improvement in this PR that way.

vot = parse("t.vot")
# Astropy automatically detects where the annotation are
mivot_viewer = MivotViewer(vot)
while mivot_viewer.get_next_row():
    mivot_object = mivot_viewer.instance
    print(f"{mivot_object.dmtype}")
    >>> mango.EpochPropagation
    print(f"{mivot_object.latitude.value} {mivot_object.longitude.value}")
    >>> 12.45 56.87
    print(f"{mivot_object.get_table_row()}"=
    >>> name      ra     dec 
    >>> --------------------------
    >>> mySource 12.45 56.87

If a user want to get a MivotView on a table row he/she has read somewhere else, he/she can use the update mechanism (which already exists):

table_row = some_user_processing_returning_a_table_row(...)
mivot_viewer.update_instance(mivot_object)

With the risk of using a table row inconsistent with the model mapping .

lmichel avatar Feb 21 '24 12:02 lmichel

I pushed a major change implementing the @gilleslandais proposal.

  • Some classes have been removed.
  • Both comments and documentation have been updated.
  • Public methods have been renamed to make the API easier to understand.
  • Mentions of Levels1/2/3 have been removed.
  • The SkyCoordautomatic instantiation has been discarded until the current Mivot API is adopted.

The whole logic is now controlled by the MivotViewer class:

  • It uses the seekers to retrieve what it needs
  • It used the XMLViewer to dynamically generate MivotInstance(se below).
scs_srv = SCSService("https://cdsarc.cds.unistra.fr/beta/viz-bin/mivotconesearch/I/239/hip_main")
m_viewer = MivotViewer(
    scs_ srv.search(
        pos=SkyCoord(ra=52.26708 * u.degree, dec=59.94027 * u.degree, frame='icrs'),
            radius=0.05
        )
    )
    mivot_instance = m_viewer.instance
    print(mivot_instance.dmtype)
    print(mivot_instance.Coordinate_coordSys.spaceRefFrame.value)

    while m_viewer.next():
        print(f"position: {mivot_instance.latitude.value} {mivot_instance.longitude.value}")

The model mapping can also be applied to numpy rows:

    votable = parse(path_to_votable)
    table = votable.resources[0].tables[0]
    mivot_viewer = MivotViewer(votable, resource_number=0)
    mivot_object = mivot_viewer.instance
    # and feed it with the table row
    for rec in table.array:
        mivot_object.update(rec)
        # show that the model retrieve the correct data values
        assert rec["RAICRS"] == mivot_object.longitude.value
        assert rec["DEICRS"] == mivot_object.latitude.value

lmichel avatar Mar 02 '24 14:03 lmichel

Hmm weird. CI didn't even try to run. And there is a conflict.

pllim avatar Mar 04 '24 19:03 pllim

The CI started well on the source branch v-liter-pron https://github.com/lmichel/pyvo/. There is a conflict with on docs/index.rst. That might be the cause of the non starting CI.

lmichel avatar Mar 04 '24 20:03 lmichel