qframe
qframe copied to clipboard
Joining dataframes
Hi, thank you for writing this library. Are there any plans to add Joins? If I were to add them at least for myself, since I am not that experienced Go developer and I doubt it will bi in par to you standards, where/how would be the smartest way to add it?
Btw. Not totally related, but I am making an interpreter in Go and I will probably use your qframe for it's dataframe implementation. It could be a nice solution for interactive data exploration/cleanup. I will show you once language is more developed.
Thanks for writing! I don't have a specific use case for joins myself at the moment but I would very much like it to be added still since it is part of a broader "dataframe offering".
You should not worry about giving it a try if you're interested in contributing. We'll sort out how to do it as we go along.
Some ideas/thoughts:
- It would probably be best/most natural to add a new top level function
Joinon the qframe which takes another QFrame and a variadic number of functional options. - Ultimately I think it would make sense to support the combinations of inner, outer and full outer joins that are available (left/right being determined by which frame the
Joinfunction is called on). - I think it would make sense to go for a hash join algorithm, some of the building blocks required for this are already present in the code used for
GroupByandDistinct. There is a hash table here https://github.com/tobgu/qframe/blob/master/internal/grouper/grouper.go that perhaps can be re-used as is. - Some data copying will likely be required to merge the two dataframes together. This is probably OK performance wise but some care should be taken to reducera it.
- NULL values should perhaps be configurable (in the case of outer joins) since not all column types have a zero/NULL representation.
I'd be happy to hear your thoughts on this!
Thank you for very thought out response. I will begin looking at the code as per your instructions and let you know when I have something working. I have a busy week ahead so maybe it won't be immediately. Thank you!
Cool! Take your time and let me know if you have ideas or questions that you would like to discuss.