vaex
vaex copied to clipboard
[FEATURE-REQUEST]Like Pandas cut
Description I want to group an numeric type column by interval of value, which is similar with using pandas cut function. In pandas I can use cut function to create an interval label column, and then group by new column:
bins = [-150, -110, -100, -90, -80, -70, -30]
data["rsrp_range"] = pd.cut(data["OptimalAvgRSRP"], bins=bins, labels=label, right=True)
pdf = data.groupby(data["rsrp_range"]).agg({"rsrp_range": "count"})
Does Vaex have similar function?
Hi,
good question.
We don't have cut implemented, but we do wrap https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html
import vaex
df = vaex.from_arrays(x=vaex.vrange(0,10))
bins = np.array([0, 3, 10]) # make sure to create a numpy array from this
df['x_bin'] = df.func.searchsorted(bins, df.x)
df['x_name'] = df.x_bin.map({0: 'small', 1: 'medium', 2: 'large'})
df
@JovanVeljanoski should we implement cut using this? Or should we have this in the docs somewhere? @heyuqi1970 or would you like to give this a try?
Regards,
Maarten
Thanks for your reply, I will try this.
This is what I wrote and used in my project:
def custom_cut(dfv, col, bins, labels=None, right=True):
# Sort the unique bin edges
sorted_bins = np.sort(np.unique(bins))
# Use searchsorted to find the bin indices for each element in x
bin_indices = dfv.func.searchsorted(sorted_bins, dfv[col], side='right' if right else 'left')
# Adjust the bin indices to handle out-of-bounds cases
bin_indices = bin_indices.clip(0, len(sorted_bins) - 1)
# Apply the labels if provided
if labels is not None:
result = bin_indices.map(dict(zip(range(len(labels)), labels)))
else:
result = bin_indices
return result
and
custom_cut(dfv, 'x', bins, labels=labels, right=False)
gives:
0 small
1 medium
2 medium
3 medium
4 large
5 large
6 large
7 large
8 large
9 large