astro-sdk icon indicating copy to clipboard operation
astro-sdk copied to clipboard

[DISCUSS] ProcessedTable interface

Open feluelle opened this issue 3 years ago • 2 comments

Please describe the feature you'd like to see

I would like to get table stats or other meta data of the table that was processed via input/output table.

Describe the solution you'd like

Every operation (or operator) would return a processed table object attached with meta data if available. For example the class could simply look like this.

class ProcessedTable(Table):
   """A processed table containing meta data of the table which was processed by an operator."""

   metadata: Any

Note that this processed table can also be used as input table which makes this feature a non-breaking change if table instances are being swapped out with processed table instances.

Acceptance Criteria

  • [ ] All checks and tests in the CI should pass
  • [ ] Unit tests (90% code coverage or more, once available)
  • [ ] Integration tests (if the feature relates to a new database or external service)
  • [ ] Example DAG
  • [ ] Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • [ ] Exception handling in case of errors
  • [ ] Logging (are we exposing useful information to the user? e.g. source and destination)
  • [ ] Improve the documentation (README, Sphinx, and any other relevant)
  • [ ] How to use Guide for the feature (example)

feluelle avatar Aug 02 '22 13:08 feluelle

I am still brainstorming about this feature.. Maybe we only attach operator specific information or table information or both, I don't know yet.

Please let me know what you think.

feluelle avatar Aug 02 '22 13:08 feluelle

I like the idea although I am not yet clear on how a user would know what data is available to them? i.e. what should I do to access the data available to me, if it is a dict, how does a user know the key name

i.e. do we have a set fields that will be part of this class/instance?

kaxil avatar Aug 15 '22 11:08 kaxil

I like the idea although I am not yet clear on how a user would know what data is available to them? i.e. what should I do to access the data available to me, if it is a dict, how does a user know the key name

Yes, we have to find out.. 😅 and then document it. It could be different depending on the sql query and the db client being used.

i.e. do we have a set fields that will be part of this class/instance?

We could have some basic ones e.g.:

  • rows affected (and percentage)
  • total rows

For basic ones which are the same for every database we can create a class.

The question is also do we want to make additional queries or only store what we get from the initial query?

feluelle avatar Aug 16 '22 07:08 feluelle

@feluelle @kaxil - Just adding my two cents.

  1. Rows affected: this info is more related to the operation than a table. Like one operator did something to a table can return rows affected / percentage but it should not be part of the table object.
  2. total rows: we can add it as part of the table object, but keeping it up to date across multiple operations can be an issue.

But returning the same object that the user passed to the operator does seem weird.

utkarsharma2 avatar Aug 16 '22 09:08 utkarsharma2

Yes, we should differentiate between table and operator information.

feluelle avatar Aug 17 '22 09:08 feluelle