pyes icon indicating copy to clipboard operation
pyes copied to clipboard

Converting string to date is broken

Open Fiedzia opened this issue 12 years ago • 6 comments

When pyes serializes date, it uses datetime.isoformat() According to python documentation, isoformat :

Return a string representing the date and time in ISO 8601 format, YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0, YYYY-MM-DDTHH:MM:SS

If utcoffset() does not return None, a 6-character string is appended, giving the UTC offset in (signed) hours and minutes: YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or, if microsecond is 0 YYYY-MM-DDTHH:MM:SS+HH:MM

So the exact format may vary, depending on exact date and timezone.

However when converting ES response back from string to datetime, following logic is used (pyes.es.py):

if isinstance(v, basestring) and len(v) == 19 time.strptime(obj, "%Y-%m-%dT%H:%M:%S")

Note that this ignores entirely microseconds and timezone. If any of those is used, pyes will not parse datetime correctly (len(v) would be different then 19) and return a string. As a result pyes is unable to convert string to datetime object - a string it generated itself.

Fiedzia avatar Jul 04 '13 18:07 Fiedzia

Yeah, I noticed that too. In fact, in my opinion this shouldn't be converted at all. After all you should know what comes back from ES. In fact, a solution we used was to initialize pyes like this:

pyes.ES(hosts, encoder=simplejson.JSONEncoder,
                   decoder=simplejson.JSONDecoder, **kwargs)

Using native simplejson encoders/decoders you get maximum speed: this is about 25% faster than original pyes' solution, according to my private benchmarks.

CGenie avatar Jul 04 '13 18:07 CGenie

The problem is related to JSON that doesn't provide a date/datetime type. For now my solution is to be coerent in try to put datetime values in ElasticSearch and to have them back as python datetime objects. The idea is not to break the code if an object is local or coming from ElasticSearch, because the software aspect it as datetime, but a string is returned. I thought to keep in memory mappings to check only datetime-string fields back to datetime objects, but it's quite complex json decoder customization and require some time to be done. The performance problem is related to this kind of helpers that consumes cpu cycles, but I design pyes so they can be disabled if required (as CGenie have shown).

aparo avatar Jul 05 '13 08:07 aparo

Mapping from ES index would be a very nice thing (especially when it would be possible to add custom parsers for specific fields), but indeed quite complex. Also, one has to take into account the fact that this need not be always isoformat: for example we keep dates like this in ES: '2013-07-26 00:00:00' which has 19 chars, but different format than the one specified in pyes, so this resulted in the worst-case scenario: datetime object failed to be made, returning the original string value.

CGenie avatar Jul 05 '13 08:07 CGenie

I've initial made some progress in Mapping from ES in my local repo. I'll hurry up to finish the work. The date/datetime supported formats can also be taken from mapping.

aparo avatar Jul 05 '13 08:07 aparo

Great news!

CGenie avatar Jul 05 '13 20:07 CGenie

I agree that its better not to try at all when dates cannot be reliably determined, and getting type from mapping would be the perfect choice. //(As a quick fix however I had to write Jason decoder that reads isoformat properly.) If you are already working on using mapping, why not push it into another branch? I could take a look and perhaps help with it.

Fiedzia avatar Jul 05 '13 22:07 Fiedzia