machinelearning
machinelearning copied to clipboard
Q: Faster way to Filter DataView
I filter data from a dataview to get all items within a specific time period. It seems slow compared to filtering with LINQ from objects in memory. Is there a faster way to do it?
var boolFilter = df["timestamp"].ElementwiseGreaterThanOrEqual(unixStartTime);
var hourlydata = df.Filter(boolFilter);
var boolFilter2 = hourlydata["timestamp"].ElementwiseLessThan(unixEndTime);
hourlydata = hourlydata.Filter(boolFilter2);
In this example, I am creating predictions for a certain time period at a time. In another example, I may need to filter by exact match. Normally, I might create a dictionary to help, but is there a way to support some type "indices" for DataViews?
Hi @torronen
Is this for the DataView or DataFrame? Looks like DataFrame, but just wanted to confirm before tagging it.
@luisquintanilla yes, you are correct, it is DataFrame.
Thanks for that clarification.
What do you mean by "support some type indices"? Also, do you have any numbers for speed between this and LINQ? It would be good to see how far behind we really are.
@michaelgsharp I am thinking about something like a dictionary or hashset to select items quickly. For example, I might want get metrics for observations from each city separately: one test set for Helsinki, 2nd for Seattle etc.
Getting the numbers is a good point to validate it. Actually, this issue is mostly about my perception of slowness and I do not yet have an exact comparison. I will do some, but I might not be able to get them very quickly.
Some increase in performance of Filtering should be achieved with #6869.