woodwork
woodwork copied to clipboard
Add a Woodwork DataFrame tips and tricks guide
There are some common woodwork questions that come up with new users, often times surrounding the most efficient way to access typing information. As the structure of how typing info is stored (Table Schema vs Column Schema) and how/when Woodwork initialization happens when getting subsets of data are somewhat opaque, we should provide a guide that helps users understand the best ways to get information from their dataframes.
Ex: Doing df.ww.columns[col_name].origin
is more efficient than doing df.ww[col_name].ww.origin
, because it avoids reinitializing woodwork on a new Series. This isn't clear unless you know the specifics of how Woodwork initialization works, so it'd be helpful to explain to users at some point in our documentation that they should do this (even if we dont want to explain why they should)
Ex 2: df.ww.logical_types
will iterate through all of the columns in your dataframe, so which can be problematic in very wide dataframes if called excessively, so care should be taken to pull out that call from for loops to avoid unnecessary computation.
Part of this issue is determining if we should instead do an FAQ. We are iceboxing for now.