Support embedded arrays for hasAndBelongsToMany relationships
Today I was watching this talk by Jorge Silva on data modeling in rethinkdb, and he goes through a number of different approaches for storing related data. The first technique he demonstrated was using a 3rd table for joining, exactly how thinky sets up n-to-n with hasAndBelongsToMany. He then went on to talk about another technique; embedding an array of ids into the document. Here's the video (skipped to the relevant part) and also an example of the structure in his demo:

Jorge brings up a few positive points about embedded arrays:
- it may be more efficient
- avoids an intersecting table
- queries are simpler
- this is the approach that he normally recommends
Since it is a fairly common (and highly recommended) pattern, shouldn't thinky support setting up relationships in this manner? It may be true that n-to-n relationships created this way would be unidirectional, but that constraint is perfectly reasonable (and probably desired) when using this pattern.
it may be more efficient
Note that it's a may:
- It's not more efficient for a write work load.
- It's also less efficient if you just need a few fields from your documents.
- It's also not obvious that it's even faster when fetching a city with its state
avoids an intersecting table
Fair enough, though that doesn't mean it's always faster.
queries are simpler
Not true. Try to write the query to fetch a city with its state. Then try to write a query that update the citi's state. Thinky used to use this pattern for its 0.x version and it was a pain to maintain relations.
this is the approach that he normally recommends
Fair enough, but I disagree with him. It's a poor approach in my opinion. It's also not relevant for thinky I think because:
- thinky does the joins under the hood. Whether it uses a third table or a embedded array of values is not visible to the developer.
- It's a mongodb-ish approach and since RethinkDB has server side joins, there's little reason to use this pattern
TL;DR: Thinky use to implement joins that way (hasMany and hasAndBelongsToMany) but reverted to a SQL-ish way of doing it in its 1.x version (I think)
You bring up valid counter-arguments. However, I do think it's worthwhile to arrive at more unified guidance for data modeling, because this video is less than a month old and the message was fairly prescriptive. Thinky adds huge value to the rethinkdb ecosystem, but the product is still young and newcomers will have more confidence when best practices are very clear and well understood.
+@thejsj and @coffeemug to chime in with more clarity.
+@dalanmiller as well.
Like in so many words @neumino said, it largely depends on your data and your more common use case. @thejsj just gave different possible possibilities, but there is definitely no one-size-fits-all situation.
In your case, you mentioned in gitter having to do a lot frequent writes to your attachments data model, I think it'd be a safe bet to keep it separate (and thus small) and join via ReQL when necessary with the message/email table you mentioned.
@dalanmiller I agree that in the email example, 3rd join table is the best solution. However my point is that with Thinky, there is currently no choice between the two models right now; you don't have the option. Maybe this turns out to be ok, but it certainly doesn't align with the no-one-size-fits-all idea or the sentiment from @thejsj 's talk.
@sjmueller -- In most of the cases, using a third table is better. The case when you actually get better performance from using embedded arrays is really narrow as far as I know. Having a simpler syntax is not relevant since thinky is doing the join under the hood.
If you think you need joins to be done via embedded arrays, you are welcome to send a pull request but this is tricky and a lot of work.
Just wanted to add my own opinion here.
First, I'm glad someone actually someone saw my talk! Thanks @sjmueller. I spent a lot of time going over different approaches and talking to all the engineers at Rethink in order to arrive at what I said. Ultimately, I want to stress again that it depends!
While I do think that it might be nice if something like Thinky (and this is coming form someone who's never used it!) would have something like what @sjmueller suggests, it seems that that would be very hard to implement and it makes all the sense in the world to me that @neumino implemented that part of Thinky with intersection tables, because it is probably the most flexible way to do this.
My talk was based around people modeling their own data and not really for people using ORMs. Implementing an ORM is not something I really considered and it would make sense to me to use an intersection table for that.
@sjmueller If you're having performance problems, I'd personally love to see what's going on and if there's a way it could be improved.
@thejsj I'm not at the point where I'm profiling performance just yet, although I'll be sure to give an update when I do.
This issue was more to open a conceptual discussion about data modeling; specifically about the recommended approach in your video that is currently not available when using thinky. I'm not an expert with rethinkdb yet, so I wanted to know whether or not it mattered that embedded id arrays are not available for relationships. I get the sense that it does not matter too much, although I'll only know definitively as I gain more experience with thinky and rethinkdb.
I also want to take a moment to say thanks to @neumino for your work on thinky. So far it's been an immense productivity boost! I'm even surprised at how well it handled a few edge cases that I originally thought would need some hacks to get working.