whylogs
whylogs copied to clipboard
Making DatasetProfileViews is ambiguous/confusing
Description
I'm creating empty DatasetProfileView
s right as part of our Ray examples. I'm using the empty view as the seed in a reduce while merging many views together. Its unclear what the "right" way of making a view is supposed to be. There are two options that I know of.
# ex1
view = DatasetProfile().view()
# ex2
view = DatasetProfileView(columns={}, dataset_timestamp=0, creation_timestamp=0)
This raises a lot of questions from the user's point of view.
- The constructor args of
DatasetProfileView
are apparently optional because I don't supply them in ex1, so why are they required if you try to make a view directly? - What function does the
dataset_timestamp
andcreation_timestamp
serve and how was it implicitly set in ex1? - What happens if I set the timestamps to 0 and just use the view I make in ex2 as a reduce seed with other profiles that are actually tracking data?
- Am I even supposed to be making views myself?
Suggestions
At the very least, DatasetProfileView()
should be updated to make columns
, dataset_timestamp
, and creation_timestamp
optional. That would at least hide some details and avoid prompting further questions on the topic. Some docs on the DatasetProfileView
would go a long way as well though.
Related
Relates to organization/repo#number
- [x] I have reviewed the Guidelines for Contributing and the Code of Conduct.