DataFrame
DataFrame copied to clipboard
Support of temporal column types
Hi Alexander,
I'm looking at "DataFrame" to use it in of my pet projects
- How to support temporal column types, e.g. LocalDate or LocalDateTime? Want I want to do is to hide a parsed CSV or Excel sheet behind "DataFrame" but Excel has temporal column types
- Did you have a look at Commons CSV? Since you strive for minimal dependencies it might be not an option but I wanted to ask :-)
Siegfried
Hi Siegfried,
Thanks for your suggestions. I plan to rework column types to allow easy definition of custom value types. This would allow the definition of DateColumns. Further, it would require the option to add hints for parsing (e.g. date format within a column).
A bridge for Commons CSV is a great idea to improve file format support.
Hi Alexander,
drop me a note when there is something to integrate - seems that I some unexpected free time in summer due to COVID-19 :-)
Siegfried
Hi Siegfried,
Help is always greatly appreciated :)
I hope that I can work on custom value type support this weekend. Further, I will try to create issues for other features and enhancements to better track progress.
I'm ready to kick into action - https://issues.apache.org/jira/browse/FREEMARKER-144
So I started working on custom value type support in this branch: value-type-abstraction and create an issue #22 There are still some TODOs to support custom value types, described in #22
This approach would allow the creation of a Date ValueType that can support different formats for parsing and writing
I got a proof of concept running integrating "nRo/Dataframe" into Apache FreeMarker CLI
Thats great!
Regarding temporal column types. How would you expect date formats are handled per default? Some kind of auto detection, or a default format that can be changed by giving hints when the data is read.
You make the decision - both approaches sound good to me :-)
What I want to do:
- Transform Apache Commons CSV into a DataFrame and here I only handle Strings (partly implemented)
- Transform a Map into a DataFrame (partly implemented)
- Transform an Excel sheet (POI & getDateCellValue) to a DataFrame
So I guess I come from side of defining the DataFrame and then populating it
Okay, thanks for the feedbacks. Unfortunately, I don't have much time at the moment. But I will keep working on that over the next days.
Mostly finished but not really happy with the code :-)
Is the feature branch stable enough to do some preliminary integration?
The feature branch is pretty stable. It only needs support for autodetection of custom types and some tests.
sorry for the delay, the custom value feature branch is now merged into master #22 .
I will now start looking into temporal column types. I suppose that three different types would be required:
- Date
- DateTime
- Timespan
Each should be able to handle different formats. Possible operations for these tables would be:
- convert them from one type into another
- operations for different time units (add minutes, hours, days,...)
- operations between columns (calculate difference,...)
Any other suggestions?
Hi, little bit confused by "Timespan" - is this a time-only value, e.g. "13:42:23" or do you mean "3:24h"?
Regarding operations - little to no suggestion from my side - I would mostly filter / sort / query on temporal columns coming from CSV and Excel.
Hi, with "Timespan" I mean a type like Java 8 Duration. Subtraction of two Dates would result in a Timespan for example
Ack - assuming that I understood the things correctly
- A column to be read would consist of either Date (e.g. 14.10.2019) or DateTime in some format (e.g. 2019-10-14T12:00:00)
- Substractions of dates would result in a timespan
A few questions along the line
- Is it possible to read timespans from CSV? Usually they don't have a qualifier such as seconds or day as cell values
- Would timestamps be supported, e .g. "12:01:31"?
sorry for the delay again
- A column to be read would consist of either Date (e.g. 14.10.2019) or DateTime in some format (e.g. 2019-10-14T12:00:00)
exactly. some points I am still not sure about:
- how to pass format information to the CSV parser (should be rather simple)
- implement autodetection for temporal columns
- How to handle Timezones
- Substractions of dates would result in a timespan
yes. Thats how I would do it. So that if two dates are extracted you get information about how many hours, minutes, seconds,... passed between those dates.
- Is it possible to read timespans from CSV? Usually they don't have a qualifier such as seconds or day as cell values
- Would timestamps be supported, e .g. "12:01:31"?
Timestamps could be supported by adding parsing hints to the CSV parser. They could either just be epoch miliseconds (Long) or a format like you described.
Autodetection for timestamps is more difficult. A timestamp could also be a Long column.
Hi Alexander,
Don't worry about the delays - it is an open-source project after all :-)
Thanks in advance,
Siegfried Goeschl
On 22.08.2020, at 12:57, Alexander Grün [email protected] wrote:
sorry for the delay again
A column to be read would consist of either Date (e.g. 14.10.2019) or DateTime in some format (e.g. 2019-10-14T12:00:00) exactly. some points I am still not sure about:
how to pass format information to the CSV parser (should be rather simple) implement autodetection for temporal columns How to handle Timezones Substractions of dates would result in a timespan yes. Thats how I would do it. So that if two dates are extracted you get information about how many hours, minutes, seconds,... passed between those dates.
Is it possible to read timespans from CSV? Usually they don't have a qualifier such as seconds or day as cell values Would timestamps be supported, e .g. "12:01:31"? Timestamps could be supported by adding parsing hints to the CSV parser. They could either just be epoch miliseconds (Long) or a format like you described.
Autodetection for timestamps is more difficult. A timestamp could also be a Long column.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nRo/DataFrame/issues/21#issuecomment-678626254, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABR6IDWFI4OUYM5E3IRVRDSB6QBHANCNFSM4M66LYMQ.