chronic_duration icon indicating copy to clipboard operation
chronic_duration copied to clipboard

Date Ranges

Open natesire opened this issue 11 years ago • 16 comments

I am writing my own solution to calculate date ranges (e.g. May 22 2014 to June 3 2015) based on chronic. I would gladly contribute this solution if needed.

natesire avatar May 01 '14 19:05 natesire

This already exists https://github.com/tmlee/time_difference

StephenOTT avatar May 01 '14 23:05 StephenOTT

Thanks. I emailed the founder of time_difference. I am actually looking for something that uses machine learning in a natural language approach. I need to parse human written date ranges. I might fork your chronic and post the beginnings of it. I am still deciding on which language to implement the machine learning in. Python has a great NLP TLKT. And C++ for Ruby extensions might take a while. But I even like Scala. Any ideas?

natesire avatar May 01 '14 23:05 natesire

From a ruby perspective do you have a aversion to wrapping chronic with time_duration?

Something like this:

require 'chronic'
require 'time_difference'

humanStatement1 = "this tuesday 1pm"
humanStatement2 = "this tuesday 3pm"

humanStatement1Parsed = Chronic.parse(humanStatement1)
humanStatement2Parsed = Chronic.parse(humanStatement2)

# very human readable version
puts TimeDifference.between(humanStatement1Parsed, humanStatement2Parsed).in_hours  #=> 2.0

# No need for the Prased Variables version
puts TimeDifference.between(Chronic.parse(humanStatement1), Chronic.parse(humanStatement2)).in_hours  #=> 2.0

# Single Line version
puts TimeDifference.between(Chronic.parse("this tuesday 1pm"), Chronic.parse("this tuesday 3pm")).in_hours  #=> 2.0

StephenOTT avatar May 01 '14 23:05 StephenOTT

Use your NLP to tokenize the statements into the start date token and the end date token (humanStatement1 and humanStatement2)

StephenOTT avatar May 01 '14 23:05 StephenOTT

For NLP have you looked at OpenNLP? http://opennlp.apache.org

and then for the ruby bindings, use: https://github.com/louismullie/open-nlp

StephenOTT avatar May 01 '14 23:05 StephenOTT

I am testing time_difference. I didn't even know about openNLP. Awesome. I am checking all of this out.

natesire avatar May 02 '14 00:05 natesire

I have to handle all kinds of weird characters like - / -- & etc... that can be inside and outside parts of the dates. I am going to write the more advanced parsing in Scala.

natesire avatar May 02 '14 16:05 natesire

This is why you have NLP to tokenize your text to remove useless characters or replace the unneeded characters or words.

StephenOTT avatar May 02 '14 16:05 StephenOTT

I see. Tokenization should work. Currently, my algorithm reads the sentence from 0 till chronic returns nil. Then it reads the sentence backwards until the previous nil point. I'll check and see how well tokenization can just provide me two dates.

natesire avatar May 02 '14 18:05 natesire

Here's an example I am running into with chronic. 'Jan first week' is nil 'Jan first' is valid in chronic 'Jan' isn't valid, chronic returns 2015-01-16 12:00:00 -0500

So your idea is to erase 'week' and leave 'first', using tokenization?

natesire avatar May 02 '14 19:05 natesire

I wrote a test in Python.

Here is the output [('Available', 'JJ'), ('June', 'NNP'), ('9', 'CD'), ('--', ':'), ('August', 'NNP'), ('first', 'JJ'), ('week', 'NN')] ['June', '9', 'August'] ['June', '9', 'August']

import nltk import MySQLdb import time import string import re

#tokenize sentence = 'Available June 9 -- August first week' tokens = nltk.word_tokenize(sentence)

parts_of_speech = nltk.pos_tag(tokens) print parts_of_speech

#allow only prepositions #NNP, CD

approved_prepositions = ['NNP', 'CD'] filtered = [] for word in parts_of_speech:

if any(x in word[1] for x in approved_prepositions):
    filtered.append(word[0])

print filtered

#normalize to alphanumeric only normalized = re.sub(r'\s\W+', ' ', ' '.join(filtered)) print filtered

natesire avatar May 02 '14 20:05 natesire

I can write a white-list function for words like 'first'. I am really liking this solution. Great idea to tokenize. Now I need a different excuse to write something in Scala. hahahahaha

natesire avatar May 02 '14 20:05 natesire

Here's an example I am running into with chronic. 'Jan first week' is nil 'Jan first' is valid in chronic 'Jan' isn't valid, chronic returns 2015-01-16 12:00:00 -0500

So your idea is to erase 'week' and leave 'first', using tokenization?

for examples like this i would make assumptions about the formats for the dates. Example if someone does "Jan First Week" you use NLP to grab the Month, and they they want Week 1. Then use the ruby date library to grab the day 1 in week 1 and day 7 in week 1.

StephenOTT avatar May 02 '14 20:05 StephenOTT

Take a look at this for an example of grabbing the date of a day number in a week number: http://www.ruby-doc.org/stdlib-2.1.1/libdoc/date/rdoc/Date.html#method-c-commercial

Then use the time_difference library to get the duration.

StephenOTT avatar May 02 '14 20:05 StephenOTT

I wrote a white-list function. Python is handling things beautifully. I can feed the output into chronic. I can call a python script from ruby. Let me know if chronic needs contributions.

natesire avatar May 02 '14 21:05 natesire

great

StephenOTT avatar May 02 '14 21:05 StephenOTT