whylogs icon indicating copy to clipboard operation
whylogs copied to clipboard

Making DatasetProfileViews is ambiguous/confusing

Open naddeoa opened this issue 1 year ago • 0 comments

Description

I'm creating empty DatasetProfileViews right as part of our Ray examples. I'm using the empty view as the seed in a reduce while merging many views together. Its unclear what the "right" way of making a view is supposed to be. There are two options that I know of.

# ex1
view = DatasetProfile().view()

# ex2
view = DatasetProfileView(columns={}, dataset_timestamp=0, creation_timestamp=0)

This raises a lot of questions from the user's point of view.

  • The constructor args of DatasetProfileView are apparently optional because I don't supply them in ex1, so why are they required if you try to make a view directly?
  • What function does the dataset_timestamp and creation_timestamp serve and how was it implicitly set in ex1?
  • What happens if I set the timestamps to 0 and just use the view I make in ex2 as a reduce seed with other profiles that are actually tracking data?
  • Am I even supposed to be making views myself?

Suggestions

At the very least, DatasetProfileView() should be updated to make columns, dataset_timestamp, and creation_timestamp optional. That would at least hide some details and avoid prompting further questions on the topic. Some docs on the DatasetProfileView would go a long way as well though.

Related

Relates to organization/repo#number

naddeoa avatar Sep 22 '22 20:09 naddeoa