firedantic icon indicating copy to clipboard operation
firedantic copied to clipboard

Support for multiple Firestore clients/databases

Open lukwam opened this issue 7 months ago • 6 comments

I wanted to start a discussion around the idea of adding support for multiple Firestore clients in firedantic. I have spent some time thinking about ways that this might work, but I haven't settled on the right pattern yet. I am interested in thoughts and feedback from the community.

Background

Currently, firedantic is configured by creating a single Firestore Client object and passing it to the class along with an optional prefix. That Firestore Client (and prefix) is then shared by all the firedantic Models in the entire application.

The Firestore Client object defines the project, credentials, and database for the connection.

Firedantic offers support for both sync and async operations, accepting either a Client or an AsyncClient object as the db.

Observations

With just a single Firestore Client that is shared by all models, it is not possible to:

  • Use a mix of Client and AsyncClient configuration objects in the same application
  • Access data from Firestore databases in different projects and/or with different credentials
  • Use multiple Firestore databases (introduced last year)

Mixing sync and async models in the same app would make it easier to migrate an application from Client to AsyncClient one model at a time, instead of having to do it all at once.

Some applications are complex and access data in multiple projects, or with different credentials. For example, an application might have read/write access to data in its own project using its primary service account and then it may also have credentials stored in secret manager that allow it to get read access to a database in a different project. Accessing data in two different projects or with two different credentials is not possible with firedantic currently.

Google introduced support for multiple databases in Firestore in February of 2024. Instead of just using the (default) database, you can have additional named databases in the same project. As an example, you could have website database that includes all the Firestore collections related to your website and another one called billing that has all the collections related to your billing system. Accessing data in two different databases is not currently possible with firedantic.

Possible Solutions

I have explored a few different ways of trying to extend firedantic to provide support for multiple clients, but I haven't settled on anything yet. I'd love other suggestions people have that might improve on these options.

Option 1: Multiple Clients

With this option, we would extend firedantic's configurations module to allow you to run configure() multiple times.

For example, you could configure a default database using the existing pattern, and then add additional named databases as well:

client = Client()
billing_client = Client(database="billing")

configure(client)
configure(billing_client, name="billing")

And then when you are defining a model, you could add something there that would tell it to use a different database using the name you passed to configure():

class MyModel(Model):
    __collection__ = "my_collection"
   __database__ = "billing"

Option 2: No Clients

With this option, you would not define the Firestore Client at the top-level of your application and then pass it into the firedantic class. Instead, you would pass the default arguments that are needed to establish the client, and then you could override those arguments in individual models.

For example, when you configure firedantic at the top level of your application, you would pass it the default settings for creating a client connection:

configure(
    mode="async", 
    project="my-default-project", 
    database="(default), 
    credentials=credentials, 
    prefix="test-"
)

Then, when you are defining a model, you could add something there that could override any of those defaults:

class MyModel(Model):
    __collection__ = "my_collection"
    __database__ = "billing"
    __project__ = "billing-project"
    __credentials__ = billing_credentials,
    __client_mode__ = "sync",
   __prefix__ = ""

The Model class could be extended so it creates or finds an appropriate client from the configuration based on the model configuration. If there is already a client defined inCONFIGURATION it uses that and otherwise it creates one and adds it to the dict.

Considerations for Transaction

Any solution that supports multiple Firestore clients is going to impact the support for transactions. The second option above is particularly problematic for transactions because each model could potentially have it's own client. All the operations in a transaction need to share the same client, so having one client for each model breaks that completely.

My goal is to find a way to do this that wouldn't completely break transactions support. I think it is okay if you can't do transactions that involve models that use multiple clients, as long as you can still do transactions when they all have the same client connection.

lukwam avatar Jun 05 '25 16:06 lukwam

Sorry again for taking some time with the reply. I could also say I will be away the whole July, but back in beginning of August and there's of course still couple of days left of this month.

I think I overall prefer the ideas from Option 1. I see some problems with Option 2. As you mentioned, you'd need to share the same client for the models if you want to use them in the same transaction, and I think this is the main problem. It will also be pretty impractical to duplicate a lot of things in each model residing in the same secondary database. I guess you could to some extent mitigate those issues by making a shared base class for those overrides, but you'd still have issues with getting the right client for the transactions.

I thought a bit more about this and I'm going to take your Option 1 as a starting point and based on that propose a new Option 3.

Option 3

In Option 1 we had a __database__ = "billing". I'm going to instead suggest using a similar approach, but use it for named DB configs/connections instead. In practice you would likely name your configs based on the databases. So you would have something like this in the model:

class MyModel(Model):
    __collection__ = "my_collection"
    __db_config__ = "billing"

That __db_config__ would be used to look up a config based on the name. And I'd ideally want the config to be made in such way, that you for each name could define both a sync and async client and the lookup would be made depending on if you're using the AsyncModel or Model (or well the BareModel or AsyncBareModel).

This would also allow you to make a parent class, in which you define the __collection__, __db_config__ and the fields. You could then make an async and a sync subclass of that without having to redefine the fields. So if you want to use a sync and async version of the model you could do something like this:

from firedantic import AsyncModel, Model
from pydantic import BaseModel

class MyParent(BaseModel):
    __collection__ = "my_collection"
    __db_config__ = "billing"
   
    my_string: str
    my_int: int


class MyAsync(MyParent, AsyncModel):
    # No need to define anything here, it all comes from parent classes.
    pass


class MySync(MyParent, Model):
    # No need to define anything here, it all comes from parent classes.
    pass

So you'd define the fields once and be able to easily get a sync and async version of it in case you need to work with the same data both asynchronously and synchronously. And of of course you could name them better or alias them when importing them to avoid the Sync/Async part of the name etc.

And here's a pretty quick draft of how I imagine the configuration could be stored and used (for illustrative purposes, should raise proper errors etc if configs are not found etc).

Something similar to this could go into the configurations.py:

class ConfigItem(BaseModel):
    prefix: str
    client: Optional[Client] = None
    async_client: Optional[AsyncClient] = None


class Configuration:
    def __init__(self):
        self.configurations: Dict[str, ConfigItem] = {}

    def add(
        self,
        name: str = "(default)",
        prefix: str = "",
        client: Optional[Client] = None,
        async_client: Optional[AsyncClient] = None,
    ) -> None:
        self.configurations[name] = ConfigItem(
            prefix=prefix,
            client=client,
            async_client=async_client,
        )

    def get_client(self, name: str = "(default)") -> Client:
        return self.configurations[name].client

    def get_async_client(self, name: str = "(default)") -> AsyncClient:
        return self.configurations[name].async_client

    def get_transaction(self, name: str = "(default)") -> Transaction:
        return self.get_client(name=name).transaction()

    def get_async_transaction(self, name: str = "(default)") -> AsyncTransaction:
        return self.get_async_client(name=name).transaction()


configuration = Configuration()

And the way you would configure it would be something like this:

from firedantic import configuration

configuration.add(
    client=Client(),
    async_client=AsyncClient(),
)
configuration.add(
    "billing",
    prefix="",
    client=Client(database="billing"),
    async_client=AsyncClient(database="billing"),
)

And I think you from the example see how you can get transactions based on the configuration name. We could also consider adding a get_transaction() method to the BareModel and AsyncBareModel for even more convenience, so you don't need to remember which connection the model is using if you only intend to use a transaction that involves one model.

If we want to make the upgrade path easy, we could likely change the current configure() to check if it's an AsyncClient or Client and call the add() correctly and the get_transaction() and get_async_transaction() could get an optional parameter for the name and default to the (default) connection.

I'm not sure I like all the names as quickly drafted above, but I think they rely the intentions pretty well.

I've tried to think of the pros and cons of this approach:

  • Allows you to easily create Async and Sync versions of the same model.
  • Somewhat simple, but flexible configuration.
  • Working with transactions should be pretty simple still (even easier if we add the .get_transaction() to the models).
  • This should solve some of the issues with type checking we had due to the fact that CONFIGURATIONS["db"] could contain either a Client or AsyncClient.
  • Creating two clients for each database (that you want to use with both sync+async) is a bit tedious. But if this becomes too tedious, you can always make your own helper function that you pass all the arguments to and it can then make the clients and add those to the configuration. Another option is of course to change the function you use to add a configuration to take in the same parameters as the Client and AsyncClient (i.e. project, credentials, database, client_info and client_options) and we make it create both clients when you add the configuration. Not sure it's really worth the effort and also means that if the client gets any new parameters we'd have to update this function, so not sure if I'd want that. This would also make it harder to make the configure() backwards compatible. Another option is to support both the parameters and giving clients, either in the same .add() (or what we call it) or we could make a secondary .add_by_params() that take arguments (obviously needs a better name). But as said, I think for a typical case the initial suggestion should be enough and can be extended either by the user or in this project later.
  • If we want to add a way to generate the name of the collection automatically based on the class name, we could add a parameter for that also in the ConfigItem and pass it in from the configuration.add(). (If you're interested in more detail, you can check the collection_generator parameter in the Arangodantic project's configuration and the get_collection_name() in models.py). In that case, I think it's pretty nice you can make multiple named configurations that use the same clients.
  • The impacts of this on how you set up the composite indexes and TTL policies has not been evaluated yet. This message is becoming too long already, so going to post this for initial feedback and evaluate that a bit more in a follow up reply.

What do you @lukwam think of the ideas presented in this Option 3? I'd also want to hear some feedback from @fbjorn, if you have time to look into this.

joakimnordling avatar Jun 26 '25 07:06 joakimnordling

Thoughts on impact on setting up composite indexes and TTL policies with Option 3

If we follow the suggestions in Option 3 and don't do any changes to the logic for setting up the TTL policies and composite indexes it means you can no longer use this kind of approach, as suggested in the example in the README:

    await async_set_up_composite_indexes_and_ttl_policies(
        gcloud_project="my-project",
        models=get_all_subclasses(AsyncModel),
        client=FirestoreAdminAsyncClient(),
    )

The reason for that is that the different Models might be using different gcloud_projects and different databases (the async_set_up_composite_indexes_and_ttl_policies() takes in a database parameter, which defaults to (default)).

Of course if you just use the default database, all is fine. But if not, then you will have to manually figure out which models are in which database and in which project and then call this multiple times with the right models, database and project, which is pretty tedious to set up, something we'd want the library to make easier for sure.

Let's try to see what we can do about this all. If we start by tackling the question about the database parameter, I think we could use it to filter the models to the ones from the input that have the corresponding __db_config__. However, I don't think we can assume that the name we use for the configuration must match the name of the database. And it would also cause issues if you'd in multiple gcloud projects have databases with the same name. So this means the hypothetical configuration.add() would need to know the database as well as the Google Cloud project. If it does that, then we can filter out models to only those that are in the desired project and database.

So we could then change the configuration setup to look like this:

configuration.add(
    name="(default)",
    project="my-project",
    database="(default)",
    client=Client(database="(default)", project="my-project"),
    async_client=AsyncClient(database="(default)", project="my-project"),
)

It starts to get a bit repetitive that you have to define the project and database 3 times. Not ideal. On the other hand, you are likely going to set this up once and I'd be surprised if someone would be using more than a handful of databases, which means this would be semi-tolerable. If someone wants to use it more dynamically, on like 100s of projects and databases, I think firedantic has been the wrong choice already in the past due to the way the configuration has worked. And someone could also make their own wrapper around this to configure it. Despite this, I'm feeling a bit tempted to instead pass in the parameters once and make the Client and AsyncClient inside the function instead.

Creating the clients in the function will however make it impossible to reuse the same clients for two differently named configs (if you for example would only want to use a different generator function for the collection names or such if we implement that). Another option is to allow both approaches; i.e. you can pass in the necessary credentials or the clients, but the project and database would be required also (but could have fallbacks).

If we'd do all the things mentioned above (i.e. add the database and project to the configuration and make the set_up_composite_indexes and set_up_ttl_policies filter the models to only those using the project and database they're being called with), we could ensure only the right indexes and TTL policies are created. However, it would still require calling the setup functions once for each project and database, which is not really great. And you'd for each call need to provide an FirestoreAdminAsyncClient (or non async dito).

Since I'm already feeling a bit tempted to give in the necessary details to create the Client and AsyncClient, why not go all in and create them and also create the FirestoreAdminAsyncClient and FirestoreAdminClient as well (or if you need to provide them different settings pass in them to override the creation of a default one).

If we'd do that, then we could change the signature of the set_up_ttl_policies, set_up_composite_indexes and set_up_composite_indexes_and_ttl_policies to no longer take in the gcloud_project, database or a client, just take in the models. We could then based on the __db_config__ of each model look up the project, database and admin_client.

So setting up the indexes and TTL policies would get simplified to just:

    await async_set_up_composite_indexes_and_ttl_policies(
        models=get_all_subclasses(AsyncModel),
    )

I drafter a bit what the configuration could look like:

import os
from typing import Dict, Union, Optional, Callable

import google.auth.credentials
import google.api_core.gapic_v1.client_info
import google.api_core.client_options
from google.cloud.firestore_admin_v1 import FirestoreAdminClient
from google.cloud.firestore_admin_v1.services.firestore_admin import FirestoreAdminAsyncClient
from google.cloud.firestore_v1 import AsyncClient, AsyncTransaction, Client, Transaction
from pydantic import BaseModel
from google.cloud.firestore_admin_v1.services.firestore_admin.transports.base import DEFAULT_CLIENT_INFO, \
    FirestoreAdminTransport


class ConfigItem(BaseModel):
    project: str
    database: str
    prefix: str
    client: Optional[Client] = None
    async_client: Optional[AsyncClient] = None
    admin_client: Optional[FirestoreAdminClient] = None
    async_admin_client: Optional[FirestoreAdminAsyncClient] = None


class Configuration:
    def __init__(self):
        self.configurations: Dict[str, ConfigItem] = {}

    def add(
        self,
        name: str = "(default)",
        *,
        project: Optional[str] = None,
        database: str = "(default)",
        prefix: str = "",
        client: Optional[Client] = None,
        async_client: Optional[AsyncClient] = None,
        admin_client: Optional[FirestoreAdminClient] = None,
        async_admin_client: Optional[FirestoreAdminAsyncClient] = None,
        credentials: Optional[google.auth.credentials.Credentials] = None,
        client_info: Optional[google.api_core.gapic_v1.client_info.ClientInfo] = None,
        client_options: Optional[Union[dict, google.api_core.client_options.ClientOptions]] = None,
        transport: Optional[Union[str, FirestoreAdminTransport, Callable[..., FirestoreAdminTransport]]] = None
    ) -> None:
        # Logic for determining default project originates from firestore BaseClient
        if project is None:
            project = (
                os.getenv("GOOGLE_CLOUD_PROJECT")
                or os.getenv("GCLOUD_PROJECT")
                or "google-cloud-firestore-emulator"
            )

        if not client:
            client = Client(
                project=project,
                credentials=credentials,
                database=database,
                client_info=client_info,
                client_options=client_options,
            )

        if not async_client:
            async_client = AsyncClient(
                project=project,
                credentials=credentials,
                database=database,
                client_info=client_info,
                client_options=client_options,
            )

        if not admin_client:
            admin_client = FirestoreAdminClient(
                credentials=credentials,
                transport=transport,
                client_options=client_options,
                client_info=client_info or DEFAULT_CLIENT_INFO,
            )

        if not async_admin_client:
            async_admin_client = FirestoreAdminAsyncClient(
                credentials=credentials,
                transport=transport,
                client_options=client_options,
                client_info=client_info or DEFAULT_CLIENT_INFO,
            )

        self.configurations[name] = ConfigItem(
            project=project,
            database=database,
            prefix=prefix,
            client=client,
            async_client=async_client,
            admin_client=admin_client,
            async_admin_client=async_admin_client,
        )

    def get_client(self, name: str = "(default)") -> Client:
        return self.configurations[name].client

    def get_async_client(self, name: str = "(default)") -> AsyncClient:
        return self.configurations[name].async_client

    def get_admin_client(self, name: str = "(default)") -> FirestoreAdminClient:
        return self.configurations[name].admin_client

    def get_async_admin_client(self, name: str = "(default)") -> FirestoreAdminAsyncClient:
        return self.configurations[name].async_admin_client

    def get_transaction(self, name: str = "(default)") -> Transaction:
        return self.get_client(name=name).transaction()

    def get_async_transaction(self, name: str = "(default)") -> AsyncTransaction:
        return self.get_async_client(name=name).transaction()

I could add that I'm not sure if I'd want to create all the clients and admin clients on setup, might be I'd actually rather store all the necessary credentials and then have 4 dictionaries for the clients, async_clients, admin_clients and async_admin_clients that I'd use as a storage for them and generate them based on the details if they do not yet exist. And if you'd pass in any of them when you call add(), they'd then be saved into the dictionaries directly. This way we'd not create any AdminClients at all unless you use the TTL policies or indexes and neither any AsyncClient if you only use sync models and the other way around.

And setting up the configuration could work like this:

configuration = Configuration()

# Old-fashioned way with defaults all the way
configuration.add(
    client=Client(),
    async_client=AsyncClient(),
    admin_client=FirestoreAdminClient(),
    async_admin_client=FirestoreAdminAsyncClient(),
)

configuration.add(
    "website-default",
    prefix="",
    project="my-website-project-123",
    database="(default)",
)
configuration.add(
    "website-billing",
    prefix="",
    project="my-website-project-123",
    database="billing",
    credentials=...,
)

So let's consider this with the logic of creating (or optionally passing in the clients) as a slightly refined iteration of Option 3.

joakimnordling avatar Jun 26 '25 12:06 joakimnordling

Sorry this got a bit long @lukwam and @fbjorn, but I think I'm at least myself very happy with the general idea outlined here. And they say well planned is half done. But as said, I'd be happy to hear your thoughts on this as well, in case you spot anything that feels inconvenient or that you think could be improved.

joakimnordling avatar Jun 26 '25 12:06 joakimnordling

Thanks for all the great feedback and ideas! I do like this Option 3 that is being discussed.

I prefer passing the necessarily details (database name, project id, etc.) as arguments to the configuration rather than pre-building the clients and passing those pre-built clients into the configuration.

And I think that If you are using the (default) database and the project from the APPLICATION_DEFAULT_CREDENTIALS and no prefix, it should remain simple to configure the client. And it should be backward compatible for existing code, ideally, so there isn't a difficult upgrade path for people to move to this new version with the updated configuration.

I think allowing the developer to name the connections however they want, and to assign models to those clients by name is a nice pattern. You could imagine wanting to access the (default) database in the current project as well as the (default) database in project2, so you don't want duplicate database names to cause an issue.

I'm excited that there seems to be some sort of consensus here for this proposed Option 3.

I see you mentioned you'll be out for July, so I won't expect a speedy response. I may or may not have time to try and write some code for this while you are gone. But when you're back, let's chat and see what is the best way to get some code that we can start playing with using this new approach to configurations.

Have a nice vacation!

lukwam avatar Jul 04 '25 02:07 lukwam

Hi @joakimnordling and @fbjorn, I've been chatting with @lukwam about this issue and took a stab at it. I've opened a PR to address what was being discussed above - eager to hear your feedback.

https://github.com/ioxiocom/firedantic/pull/89

Thanks! Marissa

mfisher29 avatar Aug 27 '25 03:08 mfisher29

A big thanks for the PR!

I'm really sorry to say we're pretty busy for the next month, so going to frankly say we're most likely not able to have a deeper look into the PR within the next month.

joakimnordling avatar Sep 03 '25 11:09 joakimnordling