qframe icon indicating copy to clipboard operation
qframe copied to clipboard

Joining dataframes

Open jankom opened this issue 5 years ago • 4 comments
trafficstars

Hi, thank you for writing this library. Are there any plans to add Joins? If I were to add them at least for myself, since I am not that experienced Go developer and I doubt it will bi in par to you standards, where/how would be the smartest way to add it?

Btw. Not totally related, but I am making an interpreter in Go and I will probably use your qframe for it's dataframe implementation. It could be a nice solution for interactive data exploration/cleanup. I will show you once language is more developed.

jankom avatar Jan 29 '20 12:01 jankom

Thanks for writing! I don't have a specific use case for joins myself at the moment but I would very much like it to be added still since it is part of a broader "dataframe offering".

You should not worry about giving it a try if you're interested in contributing. We'll sort out how to do it as we go along.

Some ideas/thoughts:

  • It would probably be best/most natural to add a new top level function Join on the qframe which takes another QFrame and a variadic number of functional options.
  • Ultimately I think it would make sense to support the combinations of inner, outer and full outer joins that are available (left/right being determined by which frame the Join function is called on).
  • I think it would make sense to go for a hash join algorithm, some of the building blocks required for this are already present in the code used for GroupBy and Distinct. There is a hash table here https://github.com/tobgu/qframe/blob/master/internal/grouper/grouper.go that perhaps can be re-used as is.
  • Some data copying will likely be required to merge the two dataframes together. This is probably OK performance wise but some care should be taken to reducera it.
  • NULL values should perhaps be configurable (in the case of outer joins) since not all column types have a zero/NULL representation.

I'd be happy to hear your thoughts on this!

tobgu avatar Feb 02 '20 21:02 tobgu

Thank you for very thought out response. I will begin looking at the code as per your instructions and let you know when I have something working. I have a busy week ahead so maybe it won't be immediately. Thank you!

jankom avatar Feb 03 '20 07:02 jankom

Cool! Take your time and let me know if you have ideas or questions that you would like to discuss.

tobgu avatar Feb 03 '20 18:02 tobgu