danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

How could I filter a df using a date column?

Open thais-molica opened this issue 2 years ago • 2 comments

Hi, everyone. I'm new to the data manipulation and I'm learning with danfojs. I have a dataframe, like this:

const data = [{id: 000, timestamp: "2015-01-01T02:00:00.000Z"}]
const df = new DataFrame(data)

I would like to filter this dataframe using this column timestamp. I tried something like:

df.query([df['timestamp'] > "2015-01-01T02:00:00.000Z" && df['timestamp'] < "2015-06-01T02:00:00.000Z" ])

But it only returned the first row when it should've returned more data. I haven't found nothing like this in the documentarion. Could anybody help me, please? Thanks

thais-molica avatar Feb 09 '23 20:02 thais-molica

Here's an example (not working) I'm trying to follow this pandas tutorial

thais-molica avatar Feb 09 '23 22:02 thais-molica

I don't know what is going on with toDateTime and getting the date column into the Series or DataFrame (I can't make that work, I think it only supports string | number | boolean anyway), but you don't have to do that. Just note that DataFrame.query does not take a predicate or anything like that, it takes a Series (or array) that is as long as the Column or Series you are querying, with boolean values as a mask for whether that element is to be returned or filtered out. So you just need to make that mask, there may be other ways, but this is one way:

const mask = df.column('timestamp').apply((timestamp) => {
	// mask using string
	return timestamp >= '2015-02-01T00:00:00.000Z' && timestamp < '2015-04-01T00:00:00.000Z'
})
df.query(mask).print()

ISO format dates are in time order when they are in alphabetical order, so long as they are all in Zulu time, or the same timezone offset. If you are dealing with different tz offsets, or just want to do the date comparison for some other reason, you can do that like this:

const mask = df.column('timestamp').apply((timestamp) => {
	// or mask using date (consider timezones offset)
	const date = new Date(timestamp)
	return date >= new Date('2015-02-01T00:00:00.000Z') && date < new Date('2015-04-01T00:00:00.000Z')
})

i.e. if one of your rows had timestamp: '2015-04-01T01:00:00.000+10:00' that would become 2015-03-31T15:00:00.000Z when made into a Date, and it would be selected by the query

kitfit-dave avatar Apr 14 '23 12:04 kitfit-dave