pyvo
pyvo copied to clipboard
Feature MIVOT (Model Instance in VOTable)
Processing VO Model Annotations
Introduction
Model Instances in VOTables (MIVOT) defines a syntax for mapping VOTable data to any model serialized in VO-DML (Virtual Observatory Data Modeling Language).
This annotation schema acts as a bridge between data and models. It associates both column/parameter metadata and VOTable data with data model elements (such as classes, attributes, types, etc.). It also complements VOTable data or metadata that may be missing from the table, e.g., coordinate system descriptions or curation tracing.
The data model elements are organized in an independent annotation block complying with the MIVOT XML schema, which is added as an extra resource above the TABLE element. The MIVOT syntax allows a data structure to be described as a hierarchy of classes. It can also represent relationships and compositions between them. Furthermore, it can construct data model objects by aggregating instances from different tables of the VOTable.
Usage
The API allows you to obtain different levels of model views on the last read data row. These levels are described below. The lowest levels are model agnostic.
They provide tools to browse model instances dynamically generated. The understanding of the model elements is the responsibility of the final user.
The highest level (4) is based on the MANGO draft model and especially to its. It has been designed to solve the EpochPropagation use case risen at 2023 South Spring Interop.
The end user API allows to obtain a model view on the last read data row, this usage corresponds to the level 3 and 4 described below.
activate_features('MIVOT')
data_path = os.path.dirname(os.path.realpath(__file__))
votable = os.path.join(data_path, "tests/data/simple-annotation-votable.xml")
m_viewer = ModelViewerLevel1(votable)
row_view = m_viewer.get_next_row_view()
print(row_view.longitude.value)
You can access each value of the object of the model. e.g.:
>>> print(row_view.Coordinate_coosys.PhysicalCoordSys_frame.spaceRefFrame.value)
>>> ICRS
The model view is a dynamically generated Python object whose field names are derived from the dmroles
of the MIVOT elements. There is no checking against the model structure at this level.
In this pull request, JOIN and dynamic references are not implemented. We keep focused on simpler patterns: The MIVOT block maps one mapped data table with the coordinate systems in the GLOBALS
PyVO Implementation
The implementation relies on the Astropy's write and read annotation modules (PR#15390) available from astropy 6.0, which allows to get and set Mivot blocks from/into VOTables. We use this new Astropy feature, MIVOT, to retrieve the MIVOT block. This implementation is built in 3 levels, denoting the abstraction level in relation to the XML block.
Level 1: ModelViewerLevel1
Provide the MIVOT block as it is in the VOTable: No references are resolved. The Mivot block is provided as an xml tree.
Level 2: ModelViewerLevel2
Provide access to an xml tree whose structure matches the model view of the current row. The internal references have been resolved (by _get_model_view()
function of the ModelViewerLevel1). The attribute values have been set with the actual data values. This XML element is intended to be used as a basis for building any objects. The level2 output can be browsed using XPATH queries allowing users to retrieve MIVOT elements by their @dmrole
or @dmtype
. At this level, the MIVOT block must still be handled as an xml element.
Level 3: ModelViewerLevel3
ModelViewerLevel3 generates, from the level 1 output, a nested dictionary representing the entire XML INSTANCE with its hierarchy.
Level 4: MivotClass
From this dictionary, we build a ~pyvo.mivot.viewer.mivot_class.MivotClass
object, which is a dictionary containing only the essential information used to process data.
MivotClass
basically stores all XML objects in its attribute dictionary __dict__
.
Level stacking
The more levels we have , the more elements are hidden:
- Hide column parsing: column identifiers in row view are replaced by model object.
(e.g.
get_attribute_by_role(coords:LonLatPoint.longitude)
) - Hide MIVOT identifiers: MIVOT XML identifiers (INSTANCE, COLLECTION, ATTRIBUTE) are replaced by model elements.
(e.g.
EpochPosition.longitude.value
) - Hide model elements: model elements are replaced by library instances.
(e.g.
EpochPosition.get_sky_coord()
)
Level | Column parsing | MIVOT Identifiers | Models elements |
---|---|---|---|
Level 1 & 2 | Hidden | Showed | Showed |
Level 3 | Hidden | Hidden | Showed |
Level 4 | Hidden | Hidden | Hidden |
Mivot package content:
Utils sub-package
This package contains modules that make it easier to handle XML elements and dictionaries, as well as the logger setup.
Seeker sub-package
-
AnnotationSeeker
provides a set of tools for extracting mapping sub-blocks to retrieve XML elements. -
RessourceSeeker
provides a set of getters for tables. -
TableIterator
is a simple wrapper that iterates over table rows.
Feature sub-package
This package contains features such as StaticReferenceResolver
which is used to resolve references (replace REFERENCE elements with the referenced objects set with the roles of the REFERENCEs). Future features to be added to this sub-package include DynamicReference
and Join
.
Exception sub-package
This package contains exception classes related to MIVOT.
Viewer sub-package
This package contains all the levels of ModelViewer described above, as well as the MivotClass
file.
Test strategy:
The test module contains one pytest file per Python module, and the datasets used for the tests are located in the 'test/data' directory. The tests check for values, intermediate objects (dictionaries), errors, and thrown exceptions.
EDIT: Few changes have been made lastly:
- ModelViewerLayer have been renamed by ModelViewerLevel.
- lxml dependency have been removed, and replaced by defusedxml if available or by native xml package.
- The
get_next_row_view()
function now allows to pass to the next row, returning the MivotClass with the new data row. - MivotClass has been added to the ModelViewerLevel1, in order to keep the same instance with only changing values that have a reference.
- EpochPropagation have been improved, it can give an ~
astropy.coordinates.sky_coordinate.SkyCoord
containing only the data available in the row.
Thank you for the review of this big code chunk.
- This message answers your global comment.
- The others will be applied to our code with the next push.
We're well aware that we're arriving with a code package that far exceeds the usual PR size. This is why, and after discussion with @tomdonaldson, we are starting with a draft PR in order to run a smooth integration within PyVO. Most of our code is used behind the scene. The user shouldn't care about it. The question is to know where to locate and how to organize all of that logic. This has been written with my own coding style (one module per class or so , modules grouped in folders by feature). This is may be not appropriate for PyVO, thus we would enjoy any advice adapting or improving our architecture.
Most of our code is used behind the scene. The user shouldn't care about it.
Then add a _
in front of the method name to indicate they are private. It will open up the possibility of easier refactor in the future without going though a deprecation cycle.
Codecov Report
Attention: Patch coverage is 86.91489%
with 123 lines
in your changes are missing coverage. Please review.
Project coverage is 81.24%. Comparing base (
ded4360
) to head (befc7bc
). Report is 14 commits behind head on main.
:exclamation: Current head befc7bc differs from pull request most recent head 57a3885. Consider uploading reports for the commit 57a3885 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #497 +/- ##
==========================================
+ Coverage 80.38% 81.24% +0.86%
==========================================
Files 52 69 +17
Lines 6189 7129 +940
==========================================
+ Hits 4975 5792 +817
- Misses 1214 1337 +123
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
After some testing achieved with Vizier people by using the service they have deployed, it turns out that the a few tweaks must be fixed :
- Better connection of the model viewers with the DALResults object.
- few more unit tests
In addition a documentation page must be written with code examples. This will be pushed on the source branch with the next commit.
After some testing achieved with Vizier people by using the service they have deployed, it turns out that the a few tweaks must be fixed :
* Better connection of the model viewers with the DALResults object. * few more unit tests
In addition a documentation page must be written with code examples. This will be pushed on the source branch with the next commit.
Done (fork CI passed)
@astropy/coordinators - please do add the contributors from this PR to the org, so CI will run here. And it would be nice to have a better, more automated way to do this.
Sure, but there are a lot of conversations to comb through here, can you please give me the usernames?
Also, such requests can also be made via https://github.com/astropy/astropy-project/blob/main/.github/ISSUE_TEMPLATE/github-admin.yaml in the future. Thanks!
p.s. We tried to automate but it's broken. Maybe @mwcraig would have time to fix it at some point.
- https://github.com/astropy/astropy-tools/issues/178
Sure, but there are a lot of conversations to comb through here, can you please give me the usernames?
Aren't they in the commits list? (and sorry, I don't know where/how to add them to the org myself)
There are 103 commits. I cannot quickly figure out without combing through them.
I think @somilia is the only one that needs to be added -- it is the fact that the person who opened the PR isn't part of the org that causes the issue IIRC.
Looks like @lmichel is the only other committer...
I only saw 2 (@somilia and @lmichel), but if we need co-authors, yes, parsing the messages is out of bounds.
Thanks! I added both to astropy-contributors
.
Hello,
Thank you for adding us. I still have a "3 workflows awaiting approval " on #497
Regards LM
Le 06/02/2024 à 20:15, Brigitta Sipőcz a écrit :
@astropy/coordinators https://github.com/orgs/astropy/teams/coordinators - please do add the contributors from this PR to the org, so CI will run here. And it would be nice to have a better, more automated way to do this.
— Reply to this email directly, view it on GitHub https://github.com/astropy/pyvo/pull/497#issuecomment-1930595725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXOP6HSAF2A4PYSLT3VPHDYSJ6NXAVCNFSM6AAAAAA64SVL2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZQGU4TKNZSGU. You are receiving this because you commented.Message ID: @.***>
-- English version: https: //www.deepl.com/translator
jesuischarlie/Tunis/Paris/Bruxelles/Berlin
Laurent Michel SSC XMM-Newton Tél : +33 (0)3 68 85 24 37 Fax : +33 (0)3 )3 68 85 24 32 Université de Strasbourg http://www.unistra.fr Observatoire Astronomique 11 Rue de l'Université F - 67200 Strasbourg
Ah, those approvals. That is controlled by GitHub, not the org. As a rule to deter spammers, GitHub says workflows do not automatically run unless you have a merged commit in the repo already.
https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
By default, all first-time contributors require approval to run workflows.
To make things easier for everyone (e.g., not having to wait around for such approvals in this PR), you can run the test suite locally instead if you are unsure. And hopefully when this gets merged, your next PR will no longer require such approvals. Good luck!
p.s. +6,214 −4
❗
By default, all first-time contributors require approval to run workflows.
For which the workaround suggested by GH was to add the contributor to the org. Maybe they changed policies since that discussion a while back.
ps: yes, the size is one of the reasons why it takes this long to get the reviews/etc going.
Le 8 févr. 2024 à 19:05, Brigitta Sipőcz @.***> a écrit :
By default, all first-time contributors require approval to run workflows.
For which the workaround suggested by GH was to add the contributor to the org. Maybe they changed policies since that discussion a while back.Anyway, no commits anymore until the review. ps: yes, the size is one of the reasons why it takes this long to get the reviews/etc goingI’m aware about the size but if I split the PR, I’ll do a second one just after: no gain thus.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Well, I did add lmichel to astropy-contributors but didn't seem to help. I dunno if the addition has to be done before this PR was opened or the workaround assumption was wrong. 🤷
I guess another workaround is for lmichel to do a very trivial fix somewhere else in this repo and get that PR merged first. (Unproven.)
I guess another workaround is for lmichel to do a very trivial fix somewhere else in this repo and get that PR merged first. (Unproven.)
That also worked before. But as Matt said above, it might need to be Somia to do these as they opened the PR (though I would be surprised if one of us maintainers to push to it at that commit wouldn't trigger the CI).
Anyway, at this point I would say wait for the review, and then we try to wrap it up as smoothly as possible. No point in trying to split it up into multiple PRs at this point, etc. And a rebase and some targeted squashing will clean up the history significantly, too, but I would wait with that after the content review, too.
FWIW I also added somilia to the same astropy-contributors team at the same time as lmichel . Anyways, sorry I couldn't help. If you need to rerun CI , feel free to ping me. I am usually responsive when I am working (USA New York time).
They both contributed in the big feature PR to core in https://github.com/astropy/astropy/pull/15390, so maybe some policy to automatically add people to the org who contribute significantly would have prevented this issue.
(note: I do not advocate for inviting all who adds a single character doc fix to the org, but here we are talking about a significant enhancement (to core astropy) that has a follow-up (this PR) that runs into technical gates as people are automatically put into newcomer bins while there are not at all newcomers to the org).
automatically add people to the org who contribute significantly
I agree. The automation is broken. See https://github.com/astropy/astropy-tools/issues/178 😢
The things are a bit more complicated since the @somilia contract ends up next week. I don't think she will continue to contribute to the project anymore. Even If I make a fake little PR to get the grant for running the CI, I couldn't do it for #497 which has been open by @somilia isn't it?
Thank for this library - I made some tests for VizieR and I have some suggestions to improve the API usability.
- hide the level model . I suggest to have only one level in the API with a name like MivotViewer for instance (see example below)
- extend ModelViewerLevel1 to accept VOTableFile and Resource in the constructor
- documentation could help ! I could make a vizier example based on a mango position propagation
Example: pseudo-code example (easy to propose, more difficult to implement :) )
from astropy.io.votable import parse
vot = parse("t.vot")
resource = vot.resources[0]
mivot_viewer = MivotViewer(resource)
# print the mivot instance
print(f"Mib=vot instance {mivot_viewer.mivot_class.dm_type}")
for rec in resource.tables[0]:
mivot_viewer.update(rec)
# print ra,dec using the mivot object
print(f"{mivot_viewer.mivot_class.longitude.value},{mivot_viewer.mivot_class.longitude.value})
The @gilleslandais post suggests a leak of a more user friendly API.
I've been looking at the impact on the current code to implement it.
It is actually limited (just rework the mivot_class
module) with the condition of keeping the table row iterator inside the MivotViewer
to ensure the last row read is consistent with the MIVOT annotations.
I suggest implementing this improvement in this PR that way.
vot = parse("t.vot")
# Astropy automatically detects where the annotation are
mivot_viewer = MivotViewer(vot)
while mivot_viewer.get_next_row():
mivot_object = mivot_viewer.instance
print(f"{mivot_object.dmtype}")
>>> mango.EpochPropagation
print(f"{mivot_object.latitude.value} {mivot_object.longitude.value}")
>>> 12.45 56.87
print(f"{mivot_object.get_table_row()}"=
>>> name ra dec
>>> --------------------------
>>> mySource 12.45 56.87
If a user want to get a MivotView
on a table row he/she has read somewhere else, he/she can use the update mechanism (which already exists):
table_row = some_user_processing_returning_a_table_row(...)
mivot_viewer.update_instance(mivot_object)
With the risk of using a table row inconsistent with the model mapping .
I pushed a major change implementing the @gilleslandais proposal.
- Some classes have been removed.
- Both comments and documentation have been updated.
- Public methods have been renamed to make the API easier to understand.
- Mentions of Levels1/2/3 have been removed.
- The
SkyCoord
automatic instantiation has been discarded until the current Mivot API is adopted.
The whole logic is now controlled by the MivotViewer
class:
- It uses the
seekers
to retrieve what it needs - It used the
XMLViewer
to dynamically generateMivotInstance
(se below).
scs_srv = SCSService("https://cdsarc.cds.unistra.fr/beta/viz-bin/mivotconesearch/I/239/hip_main")
m_viewer = MivotViewer(
scs_ srv.search(
pos=SkyCoord(ra=52.26708 * u.degree, dec=59.94027 * u.degree, frame='icrs'),
radius=0.05
)
)
mivot_instance = m_viewer.instance
print(mivot_instance.dmtype)
print(mivot_instance.Coordinate_coordSys.spaceRefFrame.value)
while m_viewer.next():
print(f"position: {mivot_instance.latitude.value} {mivot_instance.longitude.value}")
The model mapping can also be applied to numpy rows:
votable = parse(path_to_votable)
table = votable.resources[0].tables[0]
mivot_viewer = MivotViewer(votable, resource_number=0)
mivot_object = mivot_viewer.instance
# and feed it with the table row
for rec in table.array:
mivot_object.update(rec)
# show that the model retrieve the correct data values
assert rec["RAICRS"] == mivot_object.longitude.value
assert rec["DEICRS"] == mivot_object.latitude.value
Hmm weird. CI didn't even try to run. And there is a conflict.
The CI started well on the source branch v-liter-pr
on https://github.com/lmichel/pyvo/.
There is a conflict with on docs/index.rst. That might be the cause of the non starting CI.