[DOCS] quickstart-revision
This revision involves:
- minor restructuring for better flow
- more intro text to provide context and highlight Daft features
- expanded and more fun/relatable code example that includes ML classification with UDFs
Still to do: [ ] upload Parquet file to public S3 bucket [x] store images on stable URL / check licensing [x] update all doc links [x] perfection revision of owner/dog name combos :D
Thanks for your thoughts @jaychia! I agree with most of your points. I've added commits for
(1) a restructure to bring Expressions higher up,
(2) trimmed some sections that feel superfluous to me,
(3) fixed nits**
LMK what you think :)
** I'm not sure about nit(5). Filtering rights now flows nicely into Query Planning which was an intentional bridge. We could also trim the Query Planning and instead point to this docs page or a new, separate page on Query Planning in Daft (which would be great to have)?
df = df.with_column("has_dog", df["has_dog"].apply(lambda x: True, return_dtype=DataType.bool()))this is probably not what we want? I think we'd want like a fillnull or something similar.
right, I was looking for something like that but couldn't find it in the docs. Maybe something like if expr.is_null > set value?
@jaychia - this should be good to go now!
Looks good to go mostly, some more things we should address before merge:
Closing ":
We've received some comments around not knowing how to use .select() for running expressions. Perhaps this section can be expanded a little to show that you can use expressions in a .select():
For this example, we shouldn't just be using .apply to set everything to True. Instead we can show an if_else:
I think this should work:
df["has_dog"].is_null().if_else(True, df["has_dog"])
Thanks for the sharp eye @jaychia , fixes made!