grpc-node icon indicating copy to clipboard operation
grpc-node copied to clipboard

Tracking issue for grpc-js invalid metadata errors

Open murgatroid99 opened this issue 4 years ago • 32 comments

This is a tracking issue for the following error message output by @grpc/grpc-js:

Failed to add metadata entry <key>: <values>. <Extra error information>. For more information see https://github.com/grpc/grpc-node/issues/1173.

This is not a fatal error. Most applications should work as normal after seeing this. This error message indicates that a key/value pair has been omitted from the response metadata object as a result of some kind of data corruption.

This problem was originally reported at googleapis/nodejs-datastore#415. So far, all detailed reports have a timestamp as the value in the error message.

If you see this message with timestamp metadata values when making a request to a Google server from a Google Cloud server using a Google Cloud API client library or Firebase, please add a +1 reaction to this comment to help us track how frequently it happens

If you received this message using a different library or from accessing a different server, please share the following information to help us find the root cause of the error:

  • The library you are using directly
  • Where you are running your code (a cloud provider, a personal computer, etc.)
  • The domain name of the server you received the invalid metadata from (if you know it)
  • Your full error message

In addition, if you can get such a thing it would be very helpful to have a tcpdump of an unencrypted TCP session that triggers this error.

murgatroid99 avatar Nov 08 '19 01:11 murgatroid99

Using:

$ npm ls | grep grpc-js
│ │ ├─┬ @grpc/[email protected]

I don't know the library exactly, I have the following:

$ npm ls | grep google
├─┬ @google-cloud/[email protected]
│ ├─┬ [email protected]
│ │ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── [email protected] deduped
│ ├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
│ ├─┬ [email protected]
│ │ │ ├─┬ [email protected]
│ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── [email protected] deduped
│ │ ├── [email protected] deduped
│ └── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├── @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├── @google-cloud/[email protected] deduped
│ ├── [email protected] deduped
│ ├── [email protected] deduped
│ ├─┬ @google-cloud/[email protected]
│ │ ├── [email protected] deduped
│ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── [email protected] deduped

Full error (from stackdriver console):

Error: Failed to add metadata entry @���: Wed, 13 Nov 2019 07:13:16 GMT. Metadata key "@���" contains illegal characters at validate (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:35:15) at Metadata.add (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:87:9) at values.split.forEach.v (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:226:63) at Array.forEach (<anonymous>) at Object.keys.forEach.key (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:226:43) at Array.forEach (<anonymous>) at Function.fromHttp2Headers (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:200:30) at ClientHttp2Stream.stream.on (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/call-stream.js:211:56) at ClientHttp2Stream.emit (events.js:193:13) at emit (internal/http2/core.js:244:8) at processTicksAndRejections (internal/process/task_queues.js:81:17)

Related to https://github.com/googleapis/nodejs-pubsub/issues/818 maybe.

npomfret avatar Nov 13 '19 09:11 npomfret

Thank you for the report. That looks consistent with the other reports we have seen: like the others, the key is 4 apparently random characters and the value is a formatted timestamp.

murgatroid99 avatar Nov 13 '19 17:11 murgatroid99

We are experiencing the same issue.

Failed to add metadata entry €��: Sat, 07 Dec 2019 14:56:38 GMT. Metadata key "€��" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

Is there anything we can do to get the fix for this accelerated?

sempi avatar Dec 07 '19 15:12 sempi

As mentioned in the original issue, please say what library you are using, and if possible what the domain name is of the server you are accessing.

Other than that, the most helpful thing you or anyone else could provide is a full tcpdump or similar of an unencrypted TCP session that triggers this error.

murgatroid99 avatar Dec 09 '19 17:12 murgatroid99

We are using @google-cloud/[email protected] and [email protected].

The issue is related to using firestore APIs.

It does not say which server it is connected when the error happens.

node_modules/@google-cloud/firestore/node_modules/@grpc/grpc-js/build/src/metadata.js

Line 31

        const result = new Metadata();
        Object.keys(headers).forEach(key => {
            // Reserved headers (beginning with `:`) are not valid keys.
            if (key.charAt(0) === ':') {
                return;
            }
            const values = headers[key];
            try {
                if (isBinaryKey(key)) {
                    if (Array.isArray(values)) {
                        values.forEach(value => {
                            result.add(key, Buffer.from(value, 'base64'));
                        });
                    }
                    else if (values !== undefined) {
                        values.split(',').forEach(v => {
                            result.add(key, Buffer.from(v.trim(), 'base64'));
                        });
                    }
                }
                else {
                    if (Array.isArray(values)) {
                        values.forEach(value => {
                            result.add(key, value);
                        });
                    }
                    else if (values !== undefined) {
                        values.split(',').forEach(v => result.add(key, v.trim()));
                    }
                }
            }
            catch (error) {
                error.message = `Failed to add metadata entry ${key}: ${values}. ${error.message}`;
                process.emitWarning(error);
            }
        });
        return result;
    }

We are performing several million operations per day and this happens frequently across independent firestore instances. So it's very likely that this is a software issue present on all your servers or in the client library. Here is a sample across two regions with originating region (where the request is made). There is two firestore instances. One in us-central1 and one in europe-west1.

image

sempi avatar Dec 11 '19 19:12 sempi

Also the error is just logged to the console. The consuming firestore layer does not properly subscribe to this error and propagate it. It therefor fails silently.

We also re-read and check the data we wrote. So it's possible that this is an issue that does not affect the data being written or read since we have not detected any corruption on the data and we check every record by reading it back after writing it.

sempi avatar Dec 11 '19 19:12 sempi

Also within firestore it uses @grpc/[email protected]

So I'd assume it actually uses that implementation of grpc not the grpc package we use otherwise.

sempi avatar Dec 11 '19 19:12 sempi

We won't be able to give you a tcpdump due to the volume of data and frequency of occurrence and other operational considerations. But if you have a beta version of firestore that has more info when the error happens then we can give that a try in our staging environment.

sempi avatar Dec 11 '19 19:12 sempi

Thank you for the extra details. From that log image it looks like you are seeing this error probably 20-30 times out of the several million requests you make per day. Would you say that is accurate?

I want to note that so far we believe this corruption only impacts headers, and we have only seen reports so far of it impacting the date header. So you should not worry about your data being corrupted by this bug.

Overall, your report is consistent with the other reports of this bug that we have seen, so unfortunately that does not help us significantly with figuring out what is wrong.

murgatroid99 avatar Dec 11 '19 21:12 murgatroid99

Same issue on several nodejs applications on GKE clusters

image

@grpc/grpc-js "version": "0.6.14"

I am using Google Pub-Sub ("@google-cloud/pubsub": "1.1.6") and Winston for logging on all the container having the issue.

If you have a debug version of the @grpc/grpc-js library that you'd like me to run to gather more information, I would be happy to do.

786 jeromecurlier:Ingestor (master)$ npm ls | grep google
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── [email protected] deduped
│ ├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├── @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├─┬ [email protected]
│ │ │ ├─┬ [email protected]
│ ├─┬ [email protected]
│ │ ├── [email protected] deduped

jcurlier avatar Dec 12 '19 19:12 jcurlier

Same issue showing up here https://github.com/googleapis/nodejs-firestore/issues/829

jakeleventhal avatar Dec 19 '19 05:12 jakeleventhal

same issue happening over several million messages

`│ ├─┬ @grpc/[email protected]
│ │ ├─┬ @grpc/[email protected]`
[](`Failed to add metadata entry @: Thu, 02 Jan 2020 19:16:00 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Thu, 02 Jan 2020 20:27:14 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Thu, 02 Jan 2020 20:35:07 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Fri, 03 Jan 2020 03:57:36 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Fri, 03 Jan 2020 04:08:07 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Fri, 03 Jan 2020 07:44:08 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Fri, 03 Jan 2020 07:57:17 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Fri, 03 Jan 2020 17:37:07 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Sat, 04 Jan 2020 07:37:37 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Sat, 04 Jan 2020 22:47:31 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Sun, 05 Jan 2020 10:57:13 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry @: Sun, 05 Jan 2020 17:23:52 GMT. Metadata key "@" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173` )

yatingoswami avatar Jan 06 '20 03:01 yatingoswami

@yatingoswami I just want to make sure I understand: is that the total number of times you observed that warning in several million requests?

murgatroid99 avatar Jan 06 '20 21:01 murgatroid99

We are using @google-cloud/[email protected] and the error we are getting is:

"Failed to add metadata entry A: Tue, 28 Jan 2020 18:17:14 GMT. Metadata key "a" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

"Failed to add metadata entry ¾æÔ: Tue, 28 Jan 2020 20:06:29 GMT. Metadata key "¾æô" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

kbariotis avatar Jan 28 '20 21:01 kbariotis

We're using "@google-cloud/pubsub": "1.1.6"

Failed to add metadata entry A: Tue, 11 Feb 2020 14:22:24 GMT. Metadata key "a" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

Failed to add metadata entry ~ÞÏ: Tue, 11 Feb 2020 23:21:07 GMT. Metadata key "~þï" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

arvinsingla avatar Feb 12 '20 02:02 arvinsingla

using "@google-cloud/[email protected]": here

Failed to add metadata entry A: Wed, 12 Feb 2020 20:34:43 GMT. Metadata key "a" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
Failed to add metadata entry àFp9: Wed, 12 Feb 2020 20:40:59 GMT. Metadata key "àfp9" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

together with (not sure which if any of these are relevant) google-gax-1.14.1 google-auth-library-5.9.2 @grpc/grpc-js-0.6.15 @grpc/proto-loader-0.5.3

genisd avatar Feb 13 '20 07:02 genisd

I think this started happening after we started using Firestore batch updates (NodeJSv10 on CloudRun) "@grpc/grpc-js": {"version": "0.6.9"

2020-02-26 12:57:30.772 PST
(node:14) Error: Failed to add metadata entry @���: Wed, 26 Feb 2020 20:57:30 GMT. Metadata key "@���" contains illegal characters
2020-02-26 10:09:39.183 PST
(node:14) Error: Failed to add metadata entry @���: Wed, 26 Feb 2020 18:09:39 GMT. Metadata key "@���" contains illegal characters
2020-02-26 08:00:20.433 PST
(node:14) Error: Failed to add metadata entry @���: Wed, 26 Feb 2020 16:00:20 GMT. Metadata key "@���" contains illegal characters
2020-02-26 07:03:17.985 PST
(node:14) Error: Failed to add metadata entry @���: Wed, 26 Feb 2020 15:03:17 GMT. Metadata key "@���" contains illegal characters
2020-02-26 06:37:15.100 PST
(node:14) Error: Failed to add metadata entry @���: Wed, 26 Feb 2020 14:37:15 GMT. Metadata key "@���" contains illegal characters
2020-02-25 13:59:37.114 PST
(node:14) Error: Failed to add metadata entry @���: Tue, 25 Feb 2020 21:59:37 GMT. Metadata key "@���" contains illegal characters
2020-02-25 12:17:12.270 PST
(node:14) Error: Failed to add metadata entry @���: Tue, 25 Feb 2020 20:17:12 GMT. Metadata key "@���" contains illegal characters
2020-02-25 09:46:35.209 PST
(node:14) Error: Failed to add metadata entry @���: Tue, 25 Feb 2020 17:46:35 GMT. Metadata key "@���" contains illegal characters
2020-02-25 08:58:09.110 PST
(node:14) Error: Failed to add metadata entry @���: Tue, 25 Feb 2020 16:58:09 GMT. Metadata key "@���" contains illegal characters
2020-02-25 06:09:58.931 PST
(node:14) Error: Failed to add metadata entry @���: Tue, 25 Feb 2020 14:09:58 GMT. Metadata key "@���" contains illegal characters
2020-02-24 14:13:59.617 PST
(node:14) Error: Failed to add metadata entry @���: Mon, 24 Feb 2020 22:13:59 GMT. Metadata key "@���" contains illegal characters
2020-02-24 14:13:38.182 PST
(node:14) Error: Failed to add metadata entry @���: Mon, 24 Feb 2020 22:13:38 GMT. Metadata key "@���" contains illegal characters
2020-02-24 14:13:30.502 PST
(node:14) Error: Failed to add metadata entry @���: Mon, 24 Feb 2020 22:13:30 GMT. Metadata key "@���" contains illegal characters
2020-02-24 10:59:55.654 PST
(node:14) Error: Failed to add metadata entry @���: Mon, 24 Feb 2020 18:59:55 GMT. Metadata key "@���" contains illegal characters
2020-02-24 08:14:54.309 PST
(node:14) Error: Failed to add metadata entry @���: Mon, 24 Feb 2020 16:14:54 GMT. Metadata key "@���" contains illegal characters
2020-02-21 10:52:25.382 PST
(node:14) Error: Failed to add metadata entry @���: Fri, 21 Feb 2020 18:52:25 GMT. Metadata key "@���" contains illegal characters
2020-02-21 05:35:27.945 PST
(node:14) Error: Failed to add metadata entry @���: Fri, 21 Feb 2020 13:35:27 GMT. Metadata key "@���" contains illegal characters
2020-02-21 04:59:22.798 PST
(node:14) Error: Failed to add metadata entry @���: Fri, 21 Feb 2020 12:59:22 GMT. Metadata key "@���" contains illegal characters
2020-02-20 06:08:29.630 PST
(node:14) Error: Failed to add metadata entry @���: Thu, 20 Feb 2020 14:08:29 GMT. Metadata key "@���" contains illegal characters
2020-02-20 02:37:58.947 PST
(node:14) Error: Failed to add metadata entry @���: Thu, 20 Feb 2020 10:37:58 GMT. Metadata key "@���" contains illegal characters
npm ls | grep google
├── @google-cloud/[email protected]
├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
│ ├── [email protected] deduped
│ ├─┬ [email protected]
│ │ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├─┬ @google-cloud/[email protected]
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── @google-cloud/[email protected] deduped
│ │ ├── [email protected] deduped
│ │ ├── [email protected] deduped
│ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ └── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├── @google-cloud/[email protected] deduped
│ ├── [email protected] deduped
│ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├── @google-cloud/[email protected] deduped
│ ├── @google-cloud/[email protected] deduped
│ │ ├── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ └── [email protected] deduped
├─┬ @google-cloud/[email protected]
│ ├── @google-cloud/[email protected] deduped
│ ├─┬ @google-cloud/[email protected]
│ │ ├── [email protected] deduped
│ ├── @google-cloud/[email protected] deduped
├─┬ [email protected]
│ │ ├─┬ [email protected]
├─┬ [email protected]
│ ├── [email protected] deduped
│ └─┬ [email protected]
│   ├── [email protected] deduped
├─┬ [email protected]

rodneycjones avatar Feb 27 '20 04:02 rodneycjones

Failed to add metadata entry A: Sat, 25 Apr 2020 04:18:09 GMT. Metadata key "a" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
$ npm ls | grep google

├─┬ @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├─┬ @google-cloud/[email protected]
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── [email protected] deduped
│ │ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected]
│ │ ├── [email protected] deduped
│ │ ├─┬ [email protected]
│ │ │ ├── [email protected] deduped
│ └─┬ [email protected]
│   │ ├─┬ [email protected]

code is at github.com/icco/validator

icco avatar Apr 25 '20 04:04 icco

We perform cca 2-3 K operations daily

Failed to add metadata entry A: Mon, 04 May 2020 09:46:10 GMT. Metadata key "a" contains illegal characters.
Screenshot 2020-05-04 at 11 51 13
yarn list | grep google
├─ @google-cloud/[email protected]
│  ├─ @google-cloud/projectify@^1.0.4
│  ├─ @google-cloud/promisify@^1.0.4
│  ├─ google-auth-library@^5.10.1
│  ├─ google-gax@^1.14.2
├─ @google-cloud/[email protected]
├─ @google-cloud/[email protected]
│  ├─ google-libphonenumber@^3.1.6
├─ [email protected]
├─ [email protected]
│  ├─ google-auth-library@^5.0.0
├─ [email protected]
├─ [email protected]
│  ├─ google-p12-pem@^2.0.0
yarn list | grep grpc
│  ├─ @grpc/[email protected]
├─ @grpc/[email protected]
├─ @grpc/[email protected]
│  ├─ @grpc/grpc-js@^0.7.4
│  ├─ @grpc/[email protected]
│  ├─ @grpc/proto-loader@^0.5.1

IgorOrsag avatar May 04 '20 10:05 IgorOrsag

To anyone who has already commented here, can you please comment if you encountered this message while running your code somewhere other than a Google Cloud Platform server (GCE, GKE, Cloud Run, Cloud Functions, Firebase Functions, etc.)?

murgatroid99 avatar May 05 '20 16:05 murgatroid99

Encountered in local dev here. Accessing Firestore via Firebase Admin SDK as part of a data migration:

Failed to add metadata entry A: Wed, 06 May 2020 12:35:02 GMT. Metadata key "a" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

nateq314 avatar May 06 '20 12:05 nateq314

@nateq314 Thank you.

murgatroid99 avatar May 06 '20 16:05 murgatroid99

14:14 $ npm list | grep google
│ ├─┬ @google-cloud/[email protected]
│ │ ├─┬ @google-cloud/[email protected]
│ │ │ ├── @google-cloud/[email protected]
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├─┬ [email protected]
│ │ │ │ │ ├─┬ [email protected]
│ │ ├─┬ @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected]
│ ├─┬ @google-cloud/[email protected]
│ │ ├─┬ @google-cloud/[email protected]
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── @google-cloud/[email protected] deduped
│ │ │ ├── [email protected] deduped
│ │ ├── @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected]
│ │ ├── @google-cloud/[email protected]
│ │ ├─┬ [email protected]
│ │ │ │ ├─┬ [email protected]
│ │ ├─┬ [email protected]
│ │ │ │ ├─┬ [email protected]
│ │ │ │ │ │ ├─┬ [email protected]
│ │ │ ├─┬ [email protected]
│ │ │ │ │ ├─┬ [email protected]

We have a library that connects to both Spanner and BigQuery, however we see the following metadata error when making requests to Cloud Spanner. All of our environments are GKE based in different projects and regions. This error at the moment is specific to one region and GKE cluster.

Failed to add metadata entry A : Deadlock with higher priority transaction.. Metadata key "a " contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

The relevant gRPC library versions follow:

npm list | grep grpc
│ │ │ ├─┬ @grpc/[email protected]
│ │ │ │ ├─┬ @grpc/[email protected]
│ │ │ ├─┬ @grpc/[email protected]
│ │ ├─┬ [email protected]
│ │ │ └── @grpc/[email protected] deduped

Eventually our process goes OOM as follows:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
XXXXXXXXXXXXXXXXXXXX <--- Last few GCs --->
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX [17:0x55c26b98eee0]   873517 ms: Scavenge 510.2 (512.7) -> 509.5 (513.2) MB, 7.1 / 0.0 ms  (average mu = 0.142, current mu = 0.032) allocation failure
XXXXXXXXXXXXXXXXXXXX [17:0x55c26b98eee0]   875020 ms: Mark-sweep 510.3 (513.2) -> 509.6 (513.2) MB, 1310.8 / 0.0 ms  (+ 140.9 ms in 26 steps since start of marking, biggest step 73.2 ms, walltime since start of marking 1695 ms) (average mu = 0.240, current mu = 0.336) allocat
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX <--- JS stacktrace --->
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX ==== JS stack trace =========================================
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX Security context: 0x35b08fc008d1 <JSObject>
XXXXXXXXXXXXXXXXXXXX     0: builtin exit frame: new Error(aka Error)(this=0x0c7129440591 <the_hole>,0x33a74b8d4cb9 <String[54]: 10 ABORTED: Deadlock with higher priority transaction.>,0x0c7129440591 <the_hole>)
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX     1: ConstructFrame [pc: 0x55c268d0b65a]
XXXXXXXXXXXXXXXXXXXX     2: onReceiveStatus [0x1aa7c5a59d19] [/src/node_modules/@grpc/grpc-js/build/src/client.js:~167]

As mentioned this is one environment out of 6.

Crispy1975 avatar Jan 07 '21 14:01 Crispy1975

@Crispy1975 Thank you for the report. How frequently are you seeing that metadata error? And would you mind saying which GKE region is experiencing those errors?

murgatroid99 avatar Jan 07 '21 19:01 murgatroid99

@murgatroid99 apologies for the delayed response. We are seeing the messages every 20 minutes in one environment and less often in another, both on the GCP region us-west1. The Spanner clusters that are being communicated with are in europe-west4. All sorts of interesting characters...

E 2021-01-14T11:44:20.257951991Z Failed to add metadata entry A�������!���: Deadlock with higher priority transaction.. Metadata key "a�������!���" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
 
E 2021-01-14T13:02:01.637458223Z Failed to add metadata entry A������� ��: 10. Metadata key "a������� ��" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173

Edit: I just double-checked some of our other envs and these also have the same errors, albeit less frequent. Other regions are: europe-west3, europe-west4, us-east4.

Crispy1975 avatar Jan 14 '21 15:01 Crispy1975

We encountered this problem while moving from AWS NLB to AWS ALB with grpc support and migrating from grpc-node C++ to grpc-js in the server.

We're using @grpc/[email protected]

warning Resolution field "[email protected]" is incompatible with requested version "[email protected]"                                                                                       
├─ @grpc/[email protected]
├─ @grpc/[email protected]

I enabled the trace and logs in the server and for us, it seems to be failing the request every time it happens. Here's an example of a request from web service to data service.

May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.015Z | call_stream | Starting stream on subchannel 10.39.50.239:30051 with headers
May 04 17:17:25 web-service-b2e2a5ccd02d 		grpc-accept-encoding: identity,deflate,gzip
May 04 17:17:25 web-service-b2e2a5ccd02d 		accept-encoding: identity
May 04 17:17:25 web-service-b2e2a5ccd02d 		:authority: core-25154-internal.staging.resolver.com:30051
May 04 17:17:25 web-service-b2e2a5ccd02d 		user-agent: grpc-node-js/1.3.0
May 04 17:17:25 web-service-b2e2a5ccd02d 		content-type: application/grpc
May 04 17:17:25 web-service-b2e2a5ccd02d 		:method: POST
May 04 17:17:25 web-service-b2e2a5ccd02d 		:path: /DataService.RolePermissionsController/add
May 04 17:17:25 web-service-b2e2a5ccd02d 		te: trailers
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.015Z | call_stream | [5309] attachHttp2Stream from subchannel 10.39.50.239:30051
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.015Z | subchannel_refcount | 10.39.50.239:30051 callRefcount 0 -> 1
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.015Z | call_stream | [5309] sending data chunk of length 594 (deferred)
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.016Z | call_stream | [5309] calling end() on HTTP/2 stream
May 04 17:17:25 data-service-cee57b57d94b 2021-05-04T21:17:25.017Z | server | Received call to method /DataService.RolePermissionsController/add at address 0.0.0.0:30051
May 04 17:17:25 data-service-cee57b57d94b 2021-05-04T21:17:25.064Z | server_call | Request to method /DataService.RolePermissionsController/add ended with status code: OK details: OK
May 04 17:17:25 data-service-cee57b57d94b 2021-05-04T21:17:25.065Z | server_call | Request to method /DataService.RolePermissionsController/add stream closed with rstCode 0
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.065Z | call_stream | [5309] Received server headers:
May 04 17:17:25 web-service-b2e2a5ccd02d 		:status: 200
May 04 17:17:25 web-service-b2e2a5ccd02d 		server: awselb/2.0
May 04 17:17:25 web-service-b2e2a5ccd02d 		date: Tue, 04 May 2021 21:17:25 GMT
May 04 17:17:25 web-service-b2e2a5ccd02d 		content-type: application/grpc+proto
May 04 17:17:25 web-service-b2e2a5ccd02d 		grpc-accept-encoding: identity
May 04 17:17:25 web-service-b2e2a5ccd02d 		grpc-encoding: identity
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.068Z | call_stream | [5309] receive HTTP/2 data frame of length 1106
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.068Z | call_stream | [5309] parsed message of length 1106
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | call_stream | [5309] filterReceivedMessage of length 1106
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | call_stream | [5309] pushing to reader message of length 1101
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | call_stream | [5309] Received server trailers:
May 04 17:17:25 web-service-b2e2a5ccd02d 		 ’b6 V   us: 0
May 04 17:17:25 web-service-b2e2a5ccd02d 		grpc-message: OK
May 04 17:17:25 web-service-b2e2a5ccd02d Failed to add metadata entry  ’b6 V   us: 0. Metadata key " ’b6 v   us" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | call_stream | [5309] received status details string "OK" from server
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | call_stream | [5309] ended with status: code=2 details="OK"
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | subchannel_refcount | 10.39.50.239:30051 callRefcount 1 -> 0
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.069Z | call_stream | [5309] close http2 stream with code 8
May 04 17:17:25 web-service-b2e2a5ccd02d 2021-05-04T21:17:25.070Z | call_stream | [5309] HTTP/2 stream closed with code 8
May 04 17:17:25 web-service-b2e2a5ccd02d error: grpc callMethod failed to call add with error:
May 04 17:17:25 web-service-b2e2a5ccd02d error:  message=2 UNKNOWN: OK, name=Error, code=2, details=OK, , , isOperational=true, stack=Error: 2 UNKNOWN: OK
May 04 17:17:25 web-service-b2e2a5ccd02d     at Object.callErrorFromStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
May 04 17:17:25 web-service-b2e2a5ccd02d     at Object.onReceiveStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/client.js:176:52)
May 04 17:17:25 web-service-b2e2a5ccd02d     at Object.onReceiveStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141)
May 04 17:17:25 web-service-b2e2a5ccd02d     at Object.onReceiveStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181)
May 04 17:17:25 web-service-b2e2a5ccd02d     at /opt/resolver/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78
May 04 17:17:25 web-service-b2e2a5ccd02d     at processTicksAndRejections (internal/process/task_queues.js:79:11), code=2, details=OK, $ref=$["cause"]["metadata"]

Could it be something with AWS or something that should be addressed in the server/client?

felipegcampos avatar May 05 '21 14:05 felipegcampos

That's new. This is the first instance I've seen of the grpc-status header name getting corrupted. That's why the call is failing when that happens.

We don't know why these errors happen. This issue is for information gathering, so that we can try to figure it out.

It looks like that client is talking to your own server through the ALB. Is that correct? Can you trigger that error with any degree of consistency? If so, can you additionally set the environment variable NODE_DEBUG=http2 and share the output from around where that error occurs? I don't actually know if it will provide any useful information, but it might.

murgatroid99 avatar May 05 '21 14:05 murgatroid99

Yes, the server and the client are talking to each other through ALB. The problem is not consistent in the sense that it won't always happen with the same requests. However, it always happens somewhere when we run automation. We are using 12.20.1-alpine3.11 and here are the logs with the extra information you asked for:

May 05 11:51:03 object-service-a2aaac317c46 HTTP2 17: Http2Stream 385 [Http2Session server]: initiating response
May 05 11:51:03 object-service-a2aaac317c46 2021-05-05T15:51:03.889Z | server_call | Request to method /ObjectService.ObjectTypeLifeCycleController/retrieveAll ended with status code: OK details: OK
May 05 11:51:03 object-service-a2aaac317c46 HTTP2 17: Http2Stream 385 [Http2Session server]: sending trailers
May 05 11:51:03 web-service-cd3ec20b0aaf HTTP2 17: Http2Stream 477 [Http2Session client]: headers received
May 05 11:51:03 web-service-cd3ec20b0aaf HTTP2 17: Http2Stream 477 [Http2Session client]: emitting stream 'response' event
May 05 11:51:03 object-service-a2aaac317c46 HTTP2 17: Http2Stream 385 [Http2Session server]: closed with code 0, closed false, readable false
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.890Z | call_stream | [674] Received server headers:
May 05 11:51:03 object-service-a2aaac317c46 HTTP2 17: Http2Stream 385 [Http2Session server]: destroying stream
May 05 11:51:03 web-service-cd3ec20b0aaf 		:status: 200
May 05 11:51:03 object-service-a2aaac317c46 2021-05-05T15:51:03.890Z | server_call | Request to method /ObjectService.ObjectTypeLifeCycleController/retrieveAll stream closed with rstCode 0
May 05 11:51:03 web-service-cd3ec20b0aaf 		server: awselb/2.0
May 05 11:51:03 web-service-cd3ec20b0aaf 		date: Wed, 05 May 2021 15:51:03 GMT
May 05 11:51:03 web-service-cd3ec20b0aaf 		content-type: application/grpc+proto
May 05 11:51:03 web-service-cd3ec20b0aaf 		grpc-accept-encoding: identity
May 05 11:51:03 web-service-cd3ec20b0aaf 		grpc-encoding: identity
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.890Z | call_stream | [674] receive HTTP/2 data frame of length 440
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.890Z | call_stream | [674] parsed message of length 440
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.890Z | call_stream | [674] filterReceivedMessage of length 440
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.890Z | call_stream | [674] pushing to reader message of length 435
May 05 11:51:03 web-service-cd3ec20b0aaf HTTP2 17: Http2Stream 477 [Http2Session client]: headers received
May 05 11:51:03 web-service-cd3ec20b0aaf HTTP2 17: Http2Stream 477 [Http2Session client]: emitting stream 'trailers' event
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.891Z | call_stream | [674] Received server trailers:
May 05 11:51:03 web-service-cd3ec20b0aaf 		 ²f¨8V   us: 0
May 05 11:51:03 web-service-cd3ec20b0aaf 		grpc-message: OK
May 05 11:51:03 web-service-cd3ec20b0aaf Failed to add metadata entry  ²f¨8V   us: 0. Metadata key " ²f¨8v   us" contains illegal characters. For more information see https://github.com/grpc/grpc-node/issues/1173
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.892Z | call_stream | [674] received status details string "OK" from server
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.892Z | call_stream | [674] ended with status: code=2 details="OK"
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.892Z | subchannel_refcount | 10.39.50.239:30052 callRefcount 1 -> 0
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.892Z | call_stream | [674] close http2 stream with code 8
May 05 11:51:03 web-service-cd3ec20b0aaf HTTP2 17: Http2Stream 477 [Http2Session client]: closed with code 0, closed true, readable true
May 05 11:51:03 web-service-cd3ec20b0aaf HTTP2 17: Http2Stream 477 [Http2Session client]: destroying stream
May 05 11:51:03 web-service-cd3ec20b0aaf 2021-05-05T15:51:03.893Z | call_stream | [674] HTTP/2 stream closed with code 8
May 05 11:51:03 web-service-cd3ec20b0aaf error: grpc callMethod failed to call retrieveAll with error:
May 05 11:51:03 web-service-cd3ec20b0aaf error:  message=2 UNKNOWN: OK, name=Error, code=2, details=OK, , , isOperational=true, stack=Error: 2 UNKNOWN: OK
May 05 11:51:03 web-service-cd3ec20b0aaf     at Object.callErrorFromStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Object.onReceiveStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/client.js:176:52)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Object.onReceiveStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Object.onReceiveStatus (/opt/resolver/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181)
May 05 11:51:03 web-service-cd3ec20b0aaf     at /opt/resolver/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78
May 05 11:51:03 web-service-cd3ec20b0aaf     at processTicksAndRejections (internal/process/task_queues.js:79:11), code=2, details=OK, $ref=$["cause"]["metadata"]
May 05 11:51:03 web-service-cd3ec20b0aaf error: 210505/155103.829, [error] message: 2 UNKNOWN: OK, stack: Error: 2 UNKNOWN: OK
May 05 11:51:03 web-service-cd3ec20b0aaf     at /opt/resolver/packages/grpc/helpers.js:172:33
May 05 11:51:03 web-service-cd3ec20b0aaf     at tryCatcher (/opt/resolver/node_modules/bluebird/js/release/util.js:16:23)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Promise._settlePromiseFromHandler (/opt/resolver/node_modules/bluebird/js/release/promise.js:547:31)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Promise._settlePromise (/opt/resolver/node_modules/bluebird/js/release/promise.js:604:18)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Promise._settlePromise0 (/opt/resolver/node_modules/bluebird/js/release/promise.js:649:10)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Promise._settlePromises (/opt/resolver/node_modules/bluebird/js/release/promise.js:725:18)
May 05 11:51:03 web-service-cd3ec20b0aaf     at _drainQueueStep (/opt/resolver/node_modules/bluebird/js/release/async.js:93:12)
May 05 11:51:03 web-service-cd3ec20b0aaf     at _drainQueue (/opt/resolver/node_modules/bluebird/js/release/async.js:86:9)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Async._drainQueues (/opt/resolver/node_modules/bluebird/js/release/async.js:102:5)
May 05 11:51:03 web-service-cd3ec20b0aaf     at Immediate.Async.drainQueues [as _onImmediate] (/opt/resolver/node_modules/bluebird/js/release/async.js:15:14)
May 05 11:51:03 web-service-cd3ec20b0aaf     at processImmediate **(internal/timers.js:461:21)**

felipegcampos avatar May 05 '21 16:05 felipegcampos

Unfortunately, it doesn't look like those extra logs have any useful information, sorry. I do think your information is strong evidence that that this is an issue in Node itself, as opposed to some other server or proxy, but I don't even know where to start looking for the root cause, and I don't know how to get enough relevant information to file a solid, fixable issue for the Node core.

murgatroid99 avatar May 05 '21 18:05 murgatroid99

Unfortunately, it doesn't look like those extra logs have any useful information, sorry. I do think your information is strong evidence that this is an issue in Node itself, as opposed to some other server or proxy, but I don't even know where to start looking for the root cause, and I don't know how to get enough relevant information to file a solid, fixable issue for the Node core.

We're running more tests to confirm and we had to upgrade a bunch of dependencies but it seems from the preliminary tests that it works with Node v14.16.1.


Update: In case anyone else experiences the same issue, I can confirm it is no longer a problem with Node v14.16.1.

felipegcampos avatar May 06 '21 21:05 felipegcampos