Michael Collado
Michael Collado
A strategy as described in https://wiki.postgresql.org/wiki/Month_based_partitioning would be useful. Assuming daily partitions (rather than monthly), we could write a function to query runs only within a specific date range with...
Of note - it is expected that _both_ the `version` column and the `uuid` column are unique in the `dataset_versions` table (see https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/db/DatasetVersionDao.java#L156-L158 and https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/db/DatasetVersionDao.java#L198-L200 - both queries are expected...
> The `uuid` column in `dataset_versions` is the user facing versioning, while the `version` column is internal If that was the original intention, it is definitely no longer the case....
Definitely not just a naming issue - look at the code I linked to. The `RunDao` returns the `version` column from the database. I also confirmed the `dataset/versions` API looks...
My concern with using the `version` column as our internal id is that it prohibits us from supporting versions from dataset sources, like Iceberg or Delta. I think it would...
TBH, I don't see a lot of value coming from an internal `version` column for datasets that is distinct from its UUID. The job version generation makes sense, as a...
Hi The `SchemaDatasetFacet` is the one that contains the fields for the dataset schema. How did you report the event to Marquez?
Hey, sorry about the delay - can you give me a repro case? i.e., the curl (or whatever) command you used to insert the dataset and read it from the...
I think the core issue is here: https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/service/LineageService.java#L40-L43 . The `LineageService` only queries jobs to determine lineage- if the node in the query param is a dataset, it'll find the...
In the current model, that will break lineage. My question to you is, what data is being cropped/cleaned/analyzed in those two middle jobs? Can we model those jobs as reading...