machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Q: Faster way to Filter DataView

Open torronen opened this issue 3 years ago • 5 comments
trafficstars

I filter data from a dataview to get all items within a specific time period. It seems slow compared to filtering with LINQ from objects in memory. Is there a faster way to do it?

var boolFilter = df["timestamp"].ElementwiseGreaterThanOrEqual(unixStartTime);
var hourlydata = df.Filter(boolFilter);
var boolFilter2 = hourlydata["timestamp"].ElementwiseLessThan(unixEndTime);
hourlydata = hourlydata.Filter(boolFilter2);

In this example, I am creating predictions for a certain time period at a time. In another example, I may need to filter by exact match. Normally, I might create a dictionary to help, but is there a way to support some type "indices" for DataViews?

torronen avatar Apr 20 '22 11:04 torronen

Hi @torronen

Is this for the DataView or DataFrame? Looks like DataFrame, but just wanted to confirm before tagging it.

luisquintanilla avatar Apr 25 '22 13:04 luisquintanilla

@luisquintanilla yes, you are correct, it is DataFrame.

torronen avatar Apr 25 '22 14:04 torronen

Thanks for that clarification.

luisquintanilla avatar Apr 25 '22 18:04 luisquintanilla

What do you mean by "support some type indices"? Also, do you have any numbers for speed between this and LINQ? It would be good to see how far behind we really are.

michaelgsharp avatar Apr 27 '22 17:04 michaelgsharp

@michaelgsharp I am thinking about something like a dictionary or hashset to select items quickly. For example, I might want get metrics for observations from each city separately: one test set for Helsinki, 2nd for Seattle etc.

Getting the numbers is a good point to validate it. Actually, this issue is mostly about my perception of slowness and I do not yet have an exact comparison. I will do some, but I might not be able to get them very quickly.

torronen avatar Apr 27 '22 19:04 torronen

Some increase in performance of Filtering should be achieved with #6869.

asmirnov82 avatar Oct 19 '23 10:10 asmirnov82