nodejs-firestore Improve read performance by using stale reads

In the documentation, it is mentioned that stale reads may improve the performance of reading from Firestore as data can be just fetched from the nearest replica without having to reconfirm with the leader replica: https://firebase.google.com/docs/firestore/understand-reads-writes-scale#stale_reads

I'm using the following code to perform a stale read:

const random = Math.random();
const useStaleReads = random < USE_STALE_READ_PERCENTAGE;

logger.profile(`stale-read-${random}`);

let snap: DocumentSnapshot<FirebaseFirestore.DocumentData>;
if (useStaleReads) {
  export const STALE_READ_STALENESS = 60 * 1000; // 1 minute
  const maxDataStaleness: Date = new Date(
    new Date().getTime() - STALE_READ_STALENESS
  );
  snap = await firestore.runTransaction(
    async t => {
      return t.get(ref);
    },
    {
      readOnly: true,
      readTime: Timestamp.fromDate(maxDataStaleness),
    }
  );
} else {
  snap = await ref.get();
}

logger.profile(`stale-read-${random}`, {
  level: 'info',
  message: 'Read from Firestore',
  meta: {
    useStaleReads,
  },
});

As the data is not changed very often it's fine to have one minute (or even longer) stale content.

But what we are seeing is that the strong reads are faster than the stale reads:

Query used for analysing the logs

WITH latencies AS (
  SELECT
    timestamp ,
    JSON_VALUE(json_payload.metadata.useStaleReads) as uses_stale_reads,
    JSON_VALUE(json_payload.metadata.profile.durationMs) as duration_in_ms,
  FROM  `simpleclub.global._Default._AllLogs`  AS logs
  WHERE NORMALIZE_AND_CASEFOLD(logs. resource.type , NFKC) = "cloud_run_revision"
    AND NORMALIZE_AND_CASEFOLD(SAFE.STRING(logs. resource.labels ["revision_name"]), NFKC) = "cloud-run-revision"
    AND NORMALIZE_AND_CASEFOLD(SAFE.STRING(logs. resource.labels ["service_name"]), NFKC) = "cloud-run-service"
    AND REGEXP_CONTAINS(SAFE.STRING(logs. json_payload ["metadata"]["profile"]["id"]), "stale")
    AND JSON_VALUE(json_payload.metadata.useStaleReads) = "true"
  ORDER BY timestamp DESC
)
SELECT
  STRUCT(
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(5000)] AS percentile_50,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(7500)] AS percentile_75,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9000)] AS percentile_90,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9500)] AS percentile_95,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9900)] AS percentile_99,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9950)] AS percentile_99_5,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9990)] AS percentile_99_9,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9995)] AS percentile_99_95,
    APPROX_QUANTILES(duration_in_ms, 10000)[OFFSET(9999)] AS percentile_99_99
  ) as duration_in_ms,
  uses_stale_reads,
  COUNT(*) as request_count
FROM latencies
GROUP BY uses_stale_reads

I wanted to share this experience with you and maybe I'm doing something wrong here... Not sure if increasing to the 60s staleness (instead of the 15s) breaks it?

Interesting data:

We are using Firestore via GRPC (not REST)
@google-cloud/firestore: v6.8.0
Firestore database is hosted in eur3 (multi-region)
Deployed on Cloud Run
- Always on CPU
- CPU start-up boost
- max 40 requests / instance
- 1st gen execution environment
- 1 CPU
- 4GiB memory

Feb 02 '24 21:02 IchordeDionysos

A quick test with the 15s staleness shows very similar numbers ...

Feb 02 '24 21:02 IchordeDionysos

There is an unfortunate implementation detail that transactions will send a begin transaction request, followed by your get document requests. Effectively, that means transactions are sending multiple requests instead of one with the regular get request.

We are looking to improve this.

The v1 FirestoreClient allows complete access to communication protocol, including ability to set readTime on get document requests. With this, you could achieve improved performance. However, this means taking responsibility for many of the things the regular API surface handles for you. Unless you really need this, I suggest you wait until we improve the regular API surface and/or optimize our handling of transactions with readTime.

Thank you for the question.

Interest in features like this from the developer community helps inform priorities for SDK development. I will be sure to pass this on. Feel free to tell us why this important.

Feb 02 '24 22:02 tom-andersen

@tom-andersen Thanks for the provided details 👌

The reason I'm asking is that we are looking into this particular technique for a latency-sensitive service where we want to improve the latency even more.

We have already looked into and adopted techniques like caching, optimizing business logic, etc.

--

I could imagine the following designs for such a native read-time feature:

const firestore = getFirestore();
firestore.settings({
  readTime: Timestamp.fromDate(),
});

(For use-cases where you'd want all requests to query at a particular point in time. This would be useful for data recovery scripts, to not having to redefine the read time every time)

and/or:

getFirestore()
  .doc('foo/bar')
  .get({
    readTime: Timestamp.fromDate(maxDataStaleness),
  })

getFirestore()
  .collection('foo')
  .where('bar', '==', true)
  .get({
    readTime: Timestamp.fromDate(maxDataStaleness),
  })

Feb 03 '24 08:02 IchordeDionysos

I've quickly implemented a version of this and ran some tests (10k requests) in a Cloud Shell: https://github.com/googleapis/nodejs-firestore/compare/main...simpleclub-extended:nodejs-firestore:feat/support-read-time-on-get

Metric	With `readTime`	Without `readTime`	Improvement
50th percentile	16 ⭐	17	-5.88%
75th percentile	18	18	-
87.5th percentile	19 ⭐	20	-5%
93.75th percentile	21	21	-
96.88th percentile	23	23	-
98.44th percentile	25 ⭐	27	-7.41%
99.22th percentile	35	32 ⭐	+8.57%
99.61th percentile	48	45 ⭐	+6.25%
99.80th percentile	77	70 ⭐	+9.09%
99.90th percentile	101	86 ⭐	+14.85%
99.95th percentile	110 ⭐	112	-1.79%
99.98th percentile	115 ⭐	359	-67.97%
99.99th percentile	125 ⭐	565	-77.88%
99.99th percentile	512 ⭐	1326	-61.39%

Test script

import {Firestore, Timestamp} from '@google-cloud/firestore';
import {createHistogram} from 'perf_hooks';

async function run() {
  const firestore = new Firestore({
    projectId: '<project>',
  });

  const histogram = createHistogram();
  for (let i = 0; i < 10000; i++) {
    const start = performance.now();
    const maxDataStaleness: Date = new Date(
      new Date().getTime() - 15 * 1000
    );
    await firestore
      .doc('always/the/same/document')
      .get({
        readTime: Timestamp.fromDate(maxDataStaleness),
      });
    const end = performance.now();
    histogram.record(Math.round(end - start));
  }
  console.log('min', histogram.min);
  console.log('max', histogram.max);
  console.log('mean', histogram.mean);
  console.log('stddev', histogram.stddev);
  console.log('exceeds', histogram.exceeds);
  console.log('percentiles', histogram.percentiles);
}
run();

Feb 04 '24 01:02 IchordeDionysos

Okay, quickly ran another test, that randomly picks a document, instead of reading the same topic all the time (as this may result in a different behavior).

Metric	With `readTime`	Without `readTime`	Improvement
50th percentile	10 ⭐	12	-16.99%
75th percentile	12 ⭐	13	-7.69%
87.5th percentile	13 ⭐	14	-7.14%
93.75th percentile	14 ⭐	15	-6.67%
96.88th percentile	16 ⭐	17	-5.88%
98.44th percentile	18 ⭐	20	-10%
99.22th percentile	20 ⭐	26	-23%
99.61th percentile	25 ⭐	48	-47.92%
99.80th percentile	54 ⭐	79	-31.65%
99.90th percentile	73 ⭐	96	-23.96%
99.95th percentile	96 ⭐	129	-25.58%
99.98th percentile	110 ⭐	150	-26.67%
99.99th percentile	138 ⭐	202	-31.68%
99.99th percentile	145 ⭐	218	-33.49%

Test script

import {Firestore, Timestamp} from '@google-cloud/firestore';
import {createHistogram} from 'perf_hooks';

async function run() {
  const firestore = new Firestore({
    projectId: '<project>',
  });

  const documentIds = await firestore.collection('the/test/collection').listDocuments();
  console.log(documentIds.length);

  const histogram = createHistogram();
  for (let i = 0; i < 10000; i++) {
    const start = performance.now();
    const maxDataStaleness: Date = new Date(
      new Date().getTime() - 15 * 1000
    );
    const randomDocument = documentIds[Math.floor(Math.random() * documentIds.length)];
    await randomDocument.get({
      readTime: Timestamp.fromDate(maxDataStaleness),
    });
    const end = performance.now();
    histogram.record(Math.round(end - start));
  }
  console.log('min', histogram.min);
  console.log('max', histogram.max);
  console.log('mean', histogram.mean);
  console.log('stddev', histogram.stddev);
  console.log('exceeds', histogram.exceeds);
  console.log('percentiles', histogram.percentiles);
}
run();

Note: I don't get those numbers consistently 🤔

Feb 04 '24 02:02 IchordeDionysos

Looks like you were able implement the optimization. This is a good test case, where the only difference is readTime.

Understanding why you see these latencies, is a little beyond SDK support. I am sure there are other customer specific factors in play, such as database size, concurrent writes, warmup.

You may want to use Firebase support to get answer specific to your use case:

https://firebase.google.com/support/troubleshooter/firestore/queries

Can I help you with anything else?

Feb 05 '24 15:02 tom-andersen

Follow up for @IchordeDionysos. I asked internally, and was given some explanation:

Stale reads have two main values:

Avoiding any waits for pending writes. So if they are comparing strong vs stale reads on a write only workload there is likely little difference.
Using the non-primary region for reads. If they are using a regional instance than this one isn't applicable.

In your case, (2) is applicable.

You should run the workload (a) without transactions (b) from europe-west4 instead of europe-west1

Feb 06 '24 16:02 tom-andersen

@IchordeDionysos The next release of SDK will have optimization for transactions with readTime. They will reduce the number of requests required, and thereby reduce the latency. Feel free to do your test again with version 7.3.1 or newer.

https://github.com/googleapis/nodejs-firestore/pull/2002

Feb 28 '24 19:02 tom-andersen

nodejs-firestore nodejs-firestore copied to clipboard

Improve read performance by using stale reads

nodejs-firestore
nodejs-firestore copied to clipboard