astro-sdk
astro-sdk copied to clipboard
[DISCUSS] ProcessedTable interface
Please describe the feature you'd like to see
I would like to get table stats or other meta data of the table that was processed via input/output table.
Describe the solution you'd like
Every operation (or operator) would return a processed table object attached with meta data if available. For example the class could simply look like this.
class ProcessedTable(Table):
"""A processed table containing meta data of the table which was processed by an operator."""
metadata: Any
Note that this processed table can also be used as input table which makes this feature a non-breaking change if table instances are being swapped out with processed table instances.
Acceptance Criteria
- [ ] All checks and tests in the CI should pass
- [ ] Unit tests (90% code coverage or more, once available)
- [ ] Integration tests (if the feature relates to a new database or external service)
- [ ] Example DAG
- [ ] Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
- [ ] Exception handling in case of errors
- [ ] Logging (are we exposing useful information to the user? e.g. source and destination)
- [ ] Improve the documentation (README, Sphinx, and any other relevant)
- [ ] How to use Guide for the feature (example)
I am still brainstorming about this feature.. Maybe we only attach operator specific information or table information or both, I don't know yet.
Please let me know what you think.
I like the idea although I am not yet clear on how a user would know what data is available to them? i.e. what should I do to access the data available to me, if it is a dict, how does a user know the key name
i.e. do we have a set fields that will be part of this class/instance?
I like the idea although I am not yet clear on how a user would know what data is available to them? i.e. what should I do to access the data available to me, if it is a dict, how does a user know the key name
Yes, we have to find out.. 😅 and then document it. It could be different depending on the sql query and the db client being used.
i.e. do we have a set fields that will be part of this class/instance?
We could have some basic ones e.g.:
- rows affected (and percentage)
- total rows
For basic ones which are the same for every database we can create a class.
The question is also do we want to make additional queries or only store what we get from the initial query?
@feluelle @kaxil - Just adding my two cents.
- Rows affected: this info is more related to the operation than a table. Like one operator did something to a table can return rows affected / percentage but it should not be part of the table object.
- total rows: we can add it as part of the table object, but keeping it up to date across multiple operations can be an issue.
But returning the same object that the user passed to the operator does seem weird.
Yes, we should differentiate between table and operator information.