Converting string to date is broken
When pyes serializes date, it uses datetime.isoformat() According to python documentation, isoformat :
Return a string representing the date and time in ISO 8601 format, YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0, YYYY-MM-DDTHH:MM:SS
If utcoffset() does not return None, a 6-character string is appended, giving the UTC offset in (signed) hours and minutes: YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or, if microsecond is 0 YYYY-MM-DDTHH:MM:SS+HH:MM
So the exact format may vary, depending on exact date and timezone.
However when converting ES response back from string to datetime, following logic is used (pyes.es.py):
if isinstance(v, basestring) and len(v) == 19 time.strptime(obj, "%Y-%m-%dT%H:%M:%S")
Note that this ignores entirely microseconds and timezone. If any of those is used, pyes will not parse datetime correctly (len(v) would be different then 19) and return a string. As a result pyes is unable to convert string to datetime object - a string it generated itself.
Yeah, I noticed that too. In fact, in my opinion this shouldn't be converted at all. After all you should know what comes back from ES. In fact, a solution we used was to initialize pyes like this:
pyes.ES(hosts, encoder=simplejson.JSONEncoder,
decoder=simplejson.JSONDecoder, **kwargs)
Using native simplejson encoders/decoders you get maximum speed: this is about 25% faster than original pyes' solution, according to my private benchmarks.
The problem is related to JSON that doesn't provide a date/datetime type. For now my solution is to be coerent in try to put datetime values in ElasticSearch and to have them back as python datetime objects. The idea is not to break the code if an object is local or coming from ElasticSearch, because the software aspect it as datetime, but a string is returned. I thought to keep in memory mappings to check only datetime-string fields back to datetime objects, but it's quite complex json decoder customization and require some time to be done. The performance problem is related to this kind of helpers that consumes cpu cycles, but I design pyes so they can be disabled if required (as CGenie have shown).
Mapping from ES index would be a very nice thing (especially when it would be possible to add custom parsers for specific fields), but indeed quite complex. Also, one has to take into account the fact that this need not be always isoformat: for example we keep dates like this in ES: '2013-07-26 00:00:00' which has 19 chars, but different format than the one specified in pyes, so this resulted in the worst-case scenario: datetime object failed to be made, returning the original string value.
I've initial made some progress in Mapping from ES in my local repo. I'll hurry up to finish the work. The date/datetime supported formats can also be taken from mapping.
Great news!
I agree that its better not to try at all when dates cannot be reliably determined, and getting type from mapping would be the perfect choice. //(As a quick fix however I had to write Jason decoder that reads isoformat properly.) If you are already working on using mapping, why not push it into another branch? I could take a look and perhaps help with it.