horizon icon indicating copy to clipboard operation
horizon copied to clipboard

Support for existing RethinkDB tables

Open marshall007 opened this issue 8 years ago • 8 comments

Is the plan to eventually support accessing existing RethinkDB tables (and their indexes) through Fusion or will it always require managing its own "special" tables?

The existing table structure makes it difficult to use Fusion in conjunction with other (existing) applications that manage the same tables. As I understand it, the reason for this is that Fusion needs to store metadata about the secondary indexes it creates. I think there are a couple potential ways around this:

  1. Use convention-based naming of indexes. For example, an index on [ "a" , "b" ] might be named hz_a_b so it can be referenced without a lookup table.
  2. Add support for managing db/table level metadata in RethinkDB (something like rethinkdb/rethinkdb#4439).

I think (1) is the more straight-forward solution for allowing interop. On the flip side of that, we would need to consider how Fusion interacts (if at all) with indexes it didn't create. Like if my application creates an index named a, should Fusion just assume that index is ok to use or create it's own named hz_a?

marshall007 avatar Feb 17 '16 22:02 marshall007

I've been meaning to revisit the existing metadata management, because it's rather cumbersome, and we should be using changefeeds to have multiple fusion servers coexist on the same database peacefully (especially in dev mode).

(1) is only complicated by the need to escape field names properly, so (for example), hz_a_b is not ambiguous between [ 'a', 'b' ] and [ 'a_b' ]. In any case, I don't think it would be safe to use externally-created indexes in fusion. Even if we could verify that the index function was what we want, the index doesn't "belong" to the fusion layer, and a user might unwittingly break things by changing the index in some way.

The best we can do is watch rethinkdb.table_config for changes in the set of indexes that are available, but that will not notify us if the index function itself changes. (I believe that with a system table like this, we can miss changes if the data appears the same between polling windows). Short of making index functions available and comparable in ReQL, we would need to periodically inspect each index we use to make sure they are still functioning properly.

In general, I think the best solution at the moment is to stop using fusion_internal.collections to track indexes (and instead use the index names), and do not interfere with user-created indexes at all. This puts a slight responsibility on users to not create or modify indexes beginning with hz_, but I think that is acceptable.

Tryneus avatar Feb 17 '16 23:02 Tryneus

In any case, I don't think it would be safe to use externally-created indexes in fusion. Even if we could verify that the index function was what we want, the index doesn't "belong" to the fusion layer...

What about when you actually do want Fusion to use indexes you've defined? A good example is the workaround for not allowing null in secondary indexes. People often use r.row('a').default(false) in order to query documents that don't have field a.

I feel like the intuitive behavior of .findAll({ <field>: <value> }) would be to use the index I've defined as <field> first before trying hz_<field>. If we don't do something like this then you have the opposite problem whereby people will need to modify/create hz_ prefixed indexes that should be managed by Fusion.

marshall007 avatar Feb 18 '16 02:02 marshall007

Using a user-defined index like that breaks the assumption that chaining operations on two different fields will have the same behavior as an operation on either field. A query such as .findAll({ a: <value> }).above({ b: <value> }) would need to use a compound index of a and b to be performant. If a user index a is defined, we can't reuse it. Technically, we could make the hz_a_b index use the same index function as the index a for the first part (with some added support in ReQL), but that gets complicated when a is added/removed/redefined.

I think this is more of a configuration issue. Users should be able to control the behavior of .findAll({ a: <value> }) by providing fusion with a custom index function. This function would need to be managed by fusion to ensure no corner cases exist (and in the same interests, we shouldn't use user indexes directly). I think to make a feature like this complete, we would need functions as a ReQL pseudotype so we could compare and splice them more reliably.

As an aside, Fusion already eliminates sparse indexes by using r.row('a').default(r.minval) - although that will be disallowed by RethinkDB 2.3, if I remember correctly.

Tryneus avatar Feb 19 '16 02:02 Tryneus

#63 seems related

Edit: fixed issue number

deontologician avatar May 18 '16 23:05 deontologician

+1 I think the feature in discussion here would be a really powerful add to horizon. I have a use case where I want to use horizon along with my existing stack (a bunch of JAVA based web apps). This means that while my rethinkdb tables are updated from elsewhere, I want to keep my web app clients in sync. Without this feature in horizon server currently, I am unable to leverage all the features of horizon client's API.

Please let me know if I am missing out on something trivial here, or if there's another way to solve this

anubhavsagar avatar Jul 24 '16 16:07 anubhavsagar

This seems like a rather big show-stopper for anyone looking to migrate existing RethinkDb based apps on to Horizon for a rather small issue (automatic index management). I'm surprised this issue isn't more popular...

I'm not familiar with Horizon's architecture yet since I was just starting to look into migrating to it, is it feasible to partially manually manage Horizon's automatic tables in order to enable support for existing apps?

coffenbacher avatar Sep 22 '16 01:09 coffenbacher

So right now the main things horizon has been focusing on is the new app experience. Eventually we want to support existing apps but there is a large amount of work to get there. It's easier if we can assume things are set up a certain way, so that's why it's like that at the moment.

On Wed, Sep 21, 2016, 18:36 coffenbacher [email protected] wrote:

This seems like a rather big show-stopper for anyone looking to migrate existing RethinkDb based apps on to Horizon for a rather small issue (automatic index management). I'm surprised this issue isn't more popular...

I'm not familiar with Horizon's architecture yet since I was just starting to look into migrating to it, is it feasible to partially manually manage Horizon's automatic tables in order to enable support for existing apps?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rethinkdb/horizon/issues/120#issuecomment-248790226, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAFVgcdVWpBsCsODR4eQrduMR4Laxl1ks5qsdufgaJpZM4HckdH .

deontologician avatar Sep 22 '16 01:09 deontologician

For me, it's out of the question migrating the data. There's just too much at risk. Is there any chance of this feature being completed in the near future?

mfferreira avatar Oct 28 '16 03:10 mfferreira