[FR] Ability to get all descendants
We would love to have the ability to query all descendants of a document (or collection) even when the intermediate documents do not exist.
For example, let's say /my-collection/my-doc/my-sub-collection/my-sub-doc (and many siblings) exists in Firestore, but /my-collection/my-doc does not exist. It's then not possible to query those documents from the sub-collection. Most of the pieces seem to exist in QueryOptions.forKindlessAllDescendants that is used from RecursiveDelete.getAllDescendants but those are all private.
We currently achieve this by getting the FirestoreClient from @google-cloud/firestore/types/v1 by poking in some internal variables of the public Firestore client and then invoking the runQuery method on that client:
client.runQuery({
parent: `projects/${projectId}/databases/${databaseId}/documents/${rootDoc.path}`,
structuredQuery: {
from: [{ allDescendants: true }],
orderBy: [{ field: { fieldPath: '__name__' } }],
},
});
But this is a hassle and we get raw protobuf responses that we need to serialize and all the other good stuff that the public Firestore client normally does.
Another alternative is to manipulate allDescendants directly in the QueryOptions (which also works):
const collection = firestore.collection('my-collection')
assert('_queryOptions' in collection);
const qo = collection._queryOptions;
assert(typeof qo === 'object');
assert(!!qo);
assert('allDescendants' in qo);
qo.allDescendants = true;
const { docs } = await collection.get();
expect(docs.map(d => d.ref.path)).toEqual([nestedDoc.path]);
It would be wonderful if a .allDescendants() method could be added to CollectionReference that sets allDescendants in QueryOptions to true. The rest of the handling seems to be there.
Would you be willing to take a Pull Request for this? If so, I could try to create one.
Hi @wvanderdeijl
Thanks for poking around!! This is great to see.
The reason we do not have this as an official API is that this would not work with onSnapshot because our backend does not support it. We are looking into ways to bring this feature into public API however, but maybe in a different API form.
In the meantime, please continue to use your walkaround. I will post something here when we do have it as a public API.
Ah, I see. That makes the public API design a bit more tricky. Exposing this as something similar to AggregateQuery would remove onSnapshot as it should. But it also removes the possibility to build on the query with .limit, .startAfter etcetera.
But exposing as a full blown Query would give all these nice query building methods, but also exposes onSnapshot which it should not. So you would need something like DescendantQuery that extends or wraps Query but does not have the onSnapshot and all query building methods on it should return another DescendantQuery instead of Query.
For our own workaround we'll just live with a Query object that throws an error when using onSnapshot. But I understand that is not acceptable for the official public API.
For others interesting in a similar workaround in your own codebase. Be aware how allDescendants:true interacts with kindless:true in the QueryOptions.
You typically start a Query by taking a CollectionReference and treating it as a query. Under the hood a query uses a "parent" and a "collectionName" The "parent" in this scenario is the database root or a DocumentReference.
When query'ing with allDescendants:true but with the default of kindless being false. You recursively query all child documents from that root but with the restriction that the direct parent collection of the found document equals the name of the collection from your query. So starting from a collection at /collection/doc/subCollection and then performing a allDescendants and kindless=false query searches for /collection/doc/**/subCollection/anyDoc.
When setting both allDescendants and kindless to true, the name of the collection from the query is completely ignored. So starting this query from the same collection /collection/doc/subCollection would search for /collection/doc/**/anyDoc.
This had us confused for a while and we think it is an easier mental model when initiating this type of query from a DocumentReference. We now have a utility method to build a allDescendants query with that DocumentReference as its root. You can supply an optional collection name to that method to restrict the direct parent collection name of the found documents. Having that optional argument determines if the query is kindless or not.
Having that utility method means we have to start from a DocumentReference and cannot use the database root as origin to find all documents in the entire database. If you want to do that with a restriction on the direct parent collection of the found documents, this would just be a collectionGroup query which already exists. So the one thing we are missing is a allDescendants without restriction on the collection name for the entire database. But that could easily be added to have a similar utility function that does not take a DocumentReference as its input.
So, for the eventual public API design I feel it would be nice to have a allDescendents method on DocumentReference with a single optional collectionName argument. This method would create a DescendantQuery object with allDescendants set to true. When the collectionName is undefined the DescendantQuery would not have a restriction on the collection name and have kindless set to true. When the collectionName was specified, the DescendantQuery would not be kindless and would have a restriction on the collection name.
A similar allDescendents method might also be added to the Firestore class itself to build a DescendantQuery to get all documents of the entire database. The user could then further restrict that query with a where clause, limits, etc.
When query'ing with
allDescendants:truebut with the default ofkindlessbeingfalse. You recursively query all child documents from that root but with the restriction that the direct parent collection of the found document equals the name of the collection from your query. So starting from a collection at/collection/doc/subCollectionand then performing a allDescendants and kindless=false query searches for/collection/doc/**/subCollection/anyDoc.When setting both
allDescendantsandkindlesstotrue, the name of the collection from the query is completely ignored. So starting this query from the same collection/collection/doc/subCollectionwould search for/collection/doc/**/anyDoc.
So that gives us a way to query for all anyDocs in:
-
/collection/doc/**/subCollection/anyDocand -
/collection/doc/**/anyDoc
That is, find all docs that live under a certain parent doc (or the root database I guess), with or without a certain collection-name as direct parent. It does not give us a way to find all docs that live under a certain collection (/collection/**/anyDoc).
It turns out there is a way to do that, which you can find in the RecursiveDelete#getAllDescendants method. The source code explains it quite well:
https://github.com/googleapis/nodejs-firestore/blob/f58fe791c7afc59087e2555f7208cdb611470d80/dev/src/recursive-delete.ts#L258-L268
You can look at the complete method for more inspiration if you need anything like that.
We have created a utility function that works in most situations. Perhaps this is helpful to other people looking into this:
import { Query } from '@google-cloud/firestore';
import { QueryOptions } from '@google-cloud/firestore/build/src/reference/query-options';
import assert from 'assert';
/**
* Builds a `Query` that queries all recursively descendant documents from a given document. When `collection` is given it only returns
* documents where its immediate parent collection has this name. Please note that this parent collection does not have to be a direct child
* of the given `DocumentReference` since the query is recursive.
*
* Descendant documents will be found even if the given `DocumentReference` itself, or any intermediate documents, do not actually exist.
*
* The returned query does not support live queries and `onSnapshot` will throw a runtime error
*
* When the query was constructed without a `collection` argument, you cannot use `withConverter` on it as that will re-introduce an (internal)
* predicate on the `collectionId` with a non existing collection.
*/
export function allDescendants(parent: FirebaseFirestore.DocumentReference, collection?: string) {
// determine if this will be a "kindless" query meaning without restriction on the name of the direct parent (collection) of the found
// documents.
const kindless = collection === undefined;
// build a query from the document reference and optionally restrict the name of the collection owning the found document(s)
const query = parent.collection(kindless ? 'unused' : collection);
assert('_queryOptions' in query, 'Firestore query always has private _queryOptions');
// cast Query as its constructor is `protected` and we need a public constructor for Typescript to not complain.
const PublicQuery = Query as unknown as {
new (
firestore: FirebaseFirestore.Firestore,
options: QueryOptions<unknown, FirebaseFirestore.DocumentData>,
): FirebaseFirestore.Query;
};
// construct a new Query instance with a new QueryOptions instance similar to what Query does internally in the other query building
// methods.
return new PublicQuery(
query.firestore,
(query._queryOptions as QueryOptions<unknown, FirebaseFirestore.DocumentData>).with({
allDescendants: true,
kindless,
}),
);
}
And have a look at the unit tests for the (Rust based) Firestore Emulator to see how such a allDescendants query behaves when you continue building on that query with additional predicates.
We've since further enhanced our allDescendants function as we also needed the ability to use the root of the database as starting point. Perhaps the code below is helpful to somebody with the same challenge or has some ideas for a future public Firestore API:
import type * as firestoreInternalQuery from '@google-cloud/firestore/build/src/reference/query';
import assert from 'assert';
import { inspect } from 'util';
/**
* Builds a `Query` that queries all recursively descendant documents from a given root (the entire database, a CollectionReference, or
* a DocumentReference). When `options.collectionGroup` is given it only returns documents where its immediate parent collection has
* this name. Please note that this parent collection does not have to be a direct child of the given `DocumentReference` since the
* query is recursive.
*
* Descendant documents will be found even if the given `DocumentReference` itself, or any intermediate documents, do not actually
* exist.
*
* Examples using glob patterns to explain the effect of the different options:
* | root type | root | collectionGroup | glob |
* | ------------------- | ------------- | --------------- | ---------------------------------- |
* | Firestore | / | undefined | / ** / * |
* | Firestore | / | leaf | / ** / leaf / * |
* | CollectionReference | /root | undefined | / root / ** / * |
* | CollectionReference | /root | leaf | / root / ** / leaf / * |
* | DocumentReference | /coll/doc | undefined | / coll / doc / ** / * |
* | DocumentReference | /coll/doc | leaf | / coll / doc / ** / leaf / * |
* | CollectionReference | /coll/doc/sub | undefined | / coll / doc / sub / ** / * |
* | CollectionReference | /coll/doc/sub | leaf | / coll / doc / sub / ** / leaf / * |
*
* Using `Firestore` with `options.collectionGroup` is the same as a normal Firestore CollectionGroup query.
*
* The returned query does not support live queries and `onSnapshot` will throw a runtime error
*
* When the query was constructed without a `options.collectionGroup` argument, you can **NOT** use `withConverter` on it as that will
* re-introduce an (internal) predicate on the `collectionId` with a non existing collection. Feel free to use `withConverter` when
* you did supply an `options.collectionGroup`. This also makes more sense, since you are restricting the deepest collection which
* probably means all documents returned from the query are of the same type.
*
* Keep in mind that a documentId in a `.where`, `.startAfter', `.startAt`, `.endBefore`, `.endAt`, etc. has to be relative to the given
* `root`, except when the `root` was a `CollectionReference` in which case these values have to be relative to the parent of that
* `CollectionReference` since that will be the actual root of the underlying query.
*
* Note that when `root` is a `CollectionReference` and the given `collectionGroup` is equal to `root.id` then this function will throw.
* If we ever need that, we need to find a way to make that work. Currently, if we allowed that, we would not only get all documents in:
* `/coll/doc/sub/** /sub/*`, but also in `/coll/doc/sub/*`.
*/
// inspired by https://github.com/skunkteam/rust-firestore-emulator/blob/master/test-suite/tests/7-descendants-query.test.ts
export function allDescendants(
...[root, options = {}]:
| [root: FirebaseFirestore.Firestore, options?: { collectionGroup?: string }]
| [root: FirebaseFirestore.DocumentReference, options?: { collectionGroup?: string }]
| [root: FirebaseFirestore.CollectionReference, options?: { collectionGroup?: string }]
): FirebaseFirestore.Query {
// build a query from a fake collection, just to get things started. We overrule the collectionId of the query later on. Note that
// this piece of code is responsible for setting the correct `parent` on the QueryOptions. This is the parent of the "fake
// collection", i.e.:
// - when `root` is the top-level Firestore object: the top-level Firestore object itself
// - when `root` is a DocumentReference: the document (`root`) itself
// - when `root` is a CollectionReference: the parent-document of `root`
let query: FirebaseFirestore.Query<FirebaseFirestore.DocumentData> =
'collection' in root ? root.collection('UNUSED') : (root.parent ?? root.firestore).collection('UNUSED');
assertInternalQuery(query);
// cast Query as its constructor is `protected` and we need a public constructor for Typescript to not complain.
const InternalQuery = FirebaseFirestore.Query as unknown as typeof firestoreInternalQuery.Query;
// construct a new Query instance with a new QueryOptions instance similar to what Query does internally in the other query
// building methods.
query = new InternalQuery(
query.firestore,
query._queryOptions.with({
allDescendants: true,
// `kindless` means without restriction on the name of the direct parent (collection) of the found documents. This ignores
// the `collectionId` of the query.
kindless: options.collectionGroup === undefined,
// intentionally illegal name that starts and ends with two underscores so using `withConverter` will reintroduce this
// string to the query and will throw an error
collectionId: options.collectionGroup ?? '__converter_not_compatible_with_collection_id_query__',
}),
);
if ('doc' in root && 'path' in root) {
root satisfies FirebaseFirestore.CollectionReference;
if (options.collectionGroup === root.id) {
throw new Error(
'`allDescendants` does not support `collectionGroup` that is equal to the `root.id`. Both `root.id` and ' +
`\`collectionGroup\` are equal to ${inspect(root.id)}.`,
);
}
// If `root` is a `CollectionReference`, then the "parent" of the query is set to the parent of `root` (c.q. either the entire
// database or a specific document). Therefore, all paths from this point on are "rooted" at this parent (including in the where
// clause that we give to the query object). Our caller asks for a specific collection as root, but that is not possible with
// the current Firestore API, so we have to account for that here. We need to limit our search to all documents that have the
// given `root` as parent. Because the parent of `root` is already the parent of the current `query`, we only need to account
// for the last collection ID in `root`, that is:
const collectionId = root.id;
// We want to restrict the query to return only documents at `<collectionId>/**/*`. So, we need a `where` clause that selects
// only the documents with a document name that starts with `<collectionId>/`. Unfortunately we cannot filter based on the
// document name starting with `<collectionId>/`. We can only use greater than, greater than equals, smaller than, etc. Lucky
// for us, it is possible to simulate a "starts with" operation using one `>=` and one `<` operator, i.e. by querying a range of
// values. To explain the used values in that range we need to look at lexicographical ordering first (which is the ordering
// that is used in the database to compare strings).
//
// With lexicographical ordering, everything that is longer than a certain string A, but also starts with that string A, comes
// after that string A (in a dictionary for example). Take the word `cat`. It will always be the first of all the words that
// start with `cat`. The following words are ordered lexicographically:
// - case
// - cat
// - catnip
// - cats
// - cause
//
// So all strings that start with A are "equal to or larger than" the string A itself. This also means that we are able to find
// the theoretical "next word", i.e. the word that should always come strictly after A itself, nothing can be in between. That
// theoretical next word is, in the case of `cat`: `cat` + <the lowest possible letter in the alphabet>. Let's limit ourselves
// to letters of the English alphabet, in which case the first letter is `a`. So the theoretical next word after `cat` is
// `cata`.
//
// We can do the same with collection IDs. In this case the alphabet is not limited to letters of the English alphabet, but to
// the ASCII character space, so the first character is `\0` (the null byte). This gives us the following theoretical next
// collection ID:
const theoreticalNextCollectionId = `${collectionId}\0`;
// Example: if `collectionId` is 'my-collection', then `theoreticalNextCollectionId` is 'my-collection\0'. This means that there
// cannot be a collection ID between `collectionId` and `theoreticalNextCollectionId`. So, theoretically, we could use the
// following WHERE clause: `FieldPath.collectionId() >= collectionId && FieldPath.collectionId() < theoreticalNextCollectionId`.
// There is one additional complication. `FieldPath.collectionId()` does not exist. We are not allowed to use a collection ID
// (such as `collectionId` or `theoreticalNextCollectionId`) in a WHERE clause that filters document paths. We *must* use
// document paths. So we will add an additional component to each collection ID so that it becomes a document path. We will use
// the null byte trick again to find the theoretical first document ID within the given collection ID.
const fromDocumentPath = `${collectionId}/\0`;
const toDocumentPath = `${theoreticalNextCollectionId}/\0`;
// So now we can use the following WHERE clause:
// `FieldPath.documentId() >= fromDocumentPath && FieldPath.documentId() < toDocumentPath`.
query = query
.where(FirebaseFirestore.FieldPath.documentId(), '>=', fromDocumentPath)
.where(FirebaseFirestore.FieldPath.documentId(), '<', toDocumentPath);
}
return query;
function assertInternalQuery<T>(query: FirebaseFirestore.Query<T>): asserts query is firestoreInternalQuery.Query<T> {
assert('_queryOptions' in query, 'FirebaseFirestore.Query is missing _queryOptions property');
}
}