sdk-typescript
sdk-typescript copied to clipboard
[Bug] Jest detects open handles
Getting this warning when running a simple test with the test environment:
$ ./node_modules/.bin/jest --detectOpenHandles
...
Jest has detected the following 1 open handle potentially keeping Jest from exiting:
● neon threadsafe function
...
Code for reproduction:
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker, Runtime, DefaultLogger, LogEntry } from '@temporalio/worker';
import { someWorkflow } from './workflows';
let testEnv: TestWorkflowEnvironment;
beforeAll(async () => {
// Use console.log instead of console.error to avoid red output
// Filter INFO log messages for clearer test output
Runtime.install({
logger: new DefaultLogger('WARN', (entry: LogEntry) => console.log(`[${entry.level}]`, entry.message)),
});
testEnv = await TestWorkflowEnvironment.createTimeSkipping();
});
afterAll(async () => {
await testEnv?.teardown();
});
test('some random test', async () => {
const { client, nativeConnection } = testEnv;
const worker = await Worker.create({
connection: nativeConnection,
taskQueue: 'test',
workflowsPath: require.resolve('./workflows'),
activities: {
// ...
},
});
await worker.runUntil(async () => {
const result = await client.workflow.execute(someWorkflow, {
args: [],
workflowId: 'money-transfer-test-workflow',
taskQueue: 'test',
});
expect(result).toEqual('test');
});
});
Val did some investigating in https://github.com/temporalio/samples-typescript/issues/189
Test case can be reduced to this:
test(`core-bridge registerErrors() doesn't leave open handles`, async () => {
const native = require("@temporalio/core-bridge");
native.registerErrors({
IllegalStateError: () => ({}),
ShutdownError: () => ({}),
TransportError: () => ({}),
UnexpectedError: () => ({}),
GracefulShutdownPeriodExpiredError: () => ({}),
});
});
The issue is specifically with registerErrors()
. For example, that one doesn't trigger the error message:
test(`core-bridge getTimeOfDay() doesn't leave open handles`, async () => {
const native = require("@temporalio/core-bridge");
native.getTimeOfDay();
}
The issue relates to usage of the .root()
function to keep references to error constructors from Rust land.
pub fn register_errors(mut cx: FunctionContext) -> JsResult<JsUndefined> {
// ...
let mapping = cx.argument::<JsObject>(0)?;
let shutdown_error = mapping
.get::<JsFunction, _, _>(&mut cx, "ShutdownError")?
.root(&mut cx); // <----
Neon's Root
object internally register a ThreadSafeFunction
, which is a special type of resource provided by Node's napi allowing native code to call a JavaScript function from any thread. It is that ThreadSafeFunction
resource that Jest detects as being left open.
Opened a ticket in Neon: https://github.com/neon-bindings/neon/issues/948
We hit the same issue in our test setup. Weirdly it causes issues in CircleCI as jest just won't exit despite using --forceExit
flag. We upgraded to the latest version of sdk and the issue still persists. It is intermittent, so it is hard to pinpoint it to temporal tests, but this behaviour started around the same time as we added temporal tests
Is there any known workaround?
Since the new version we are hitting an issue with Jest on CircleCI as well.
Running into Too long with no output (exceeded 10m0s): context deadline exceeded
Is there any known workaround?
I'm not aware of any workaround regarding Jest's warning about open handle. This warning message can simply be ignored. This may also cause a 10 seconds delay on process termination, which should generally not be that much of an issue.
Since the new version we are hitting an issue with Jest on CircleCI as well. Running into
Too long with no output (exceeded 10m0s): context deadline exceeded
We hit the same issue in our test setup. Weirdly it causes issues in CircleCI as jest just won't exit despite using --forceExit flag. We upgraded to the latest version of sdk and the issue still persists. It is intermittent, so it is hard to pinpoint it to temporal tests, but this behaviour started around the same time as we added temporal tests
I don't think this is related to the "detected open handle" bug. Wondering if this might be related, though there's unfortunately very little context to work with.
@mandriv @Irvenae Let assume this is a different issue... Could you please provide more details on these situations? Here are some ideas info that might help figure this out, but feel free to add whatever that you think could be useful:
- Temporal SDK/Jest/Node versions?
- Collecting coverage metrics?
- Is that behaviour happening consistently or is it intermittent?
- If you run these tests locally using the very same command as when these tests are being run in CircleCI, does execution completes normally and process stops by itself?
- Do you very get long period of time where the process stop outputting anything? How long
- Can you still reproduce the error with a reduced test suite?
- Temporal SDK v1.6.0, Jest 29.3.0, node v16.15.0
- No coverage metrics
- intermittent
- Locally tests complete (tried it 10 times)
This is the log we get in circleCI:
2023-02-28T09:49:27.110Z [WARN] This program is running inside a containerized environment with a memory constraint (eg. docker --memory 70226m). Node itself does not consider this memory constraint in how it manages its heap memory. There is consequently a high probability that the process will crash due to running out of memory. To increase reliability, we recommend adding '--max-old-space-size=52669' to your node arguments. Refer to https://docs.temporal.io/application-development/worker-performance for more advice on tuning your workers.
2023-02-28T09:49:27.145998Z INFO temporal_sdk_core::ephemeral_server: Downloading https://temporal.download/assets/temporalio/sdk-java/releases/download/v1.17.0/temporal-test-server_1.17.0_linux_amd64.tar.gz to /tmp/temporal-test-server-sdk-typescript-1.6.0
2023-02-28T09:49:27.871Z [INFO] Creating worker {
options: {
namespace: 'default',
identity: '454@f5f8c41c3a04',
shutdownGraceTime: '10s',
maxConcurrentActivityTaskExecutions: 100,
maxConcurrentLocalActivityExecutions: 100,
enableNonLocalActivities: true,
maxConcurrentWorkflowTaskExecutions: 100,
stickyQueueScheduleToStartTimeout: '5m',
maxHeartbeatThrottleInterval: '60s',
defaultHeartbeatThrottleInterval: '30s',
isolateExecutionTimeout: '4294967295ms',
workflowThreadPoolSize: 1,
maxCachedWorkflows: 1761,
enableSDKTracing: false,
showStackTraceSources: true,
reuseV8Context: true,
debugMode: true,
interceptors: { activityInbound: [Array], workflowModules: [Array] },
sinks: { logger: [Object] },
workflowsPath: '/root/repo/packages/temporal-features/src/tests/allWorkflows.ts',
bundlerOptions: { webpackConfigHook: [Function: webpackConfigHook] },
activities: {
cancelJob: [Getter],
createItem: [Getter],
createItemWithData: [Getter],
deleteItem: [Getter],
findItems: [Getter],
findItemsInAllVersions: [Getter],
getFileJson: [Getter],
getItem: [Getter],
persistData: [Getter],
setItem: [Getter],
createOrMemoizeJob: [Getter],
getJobResult: [Getter],
getResult: [Getter],
queryWorkflow: [Getter],
signalWithStartWorkflowForItem: [Getter],
signalWorkflow: [Getter],
startAndQueryWorkflowForItem: [Getter]
},
taskQueue: 'test',
connection: NativeConnection {
nativeClient: [External: 7fe4b80045b0],
referenceHolders: Set(0) {}
},
shutdownGraceTimeMs: 10000,
stickyQueueScheduleToStartTimeoutMs: 300000,
isolateExecutionTimeoutMs: 4294967295,
maxHeartbeatThrottleIntervalMs: 60000,
defaultHeartbeatThrottleIntervalMs: 30000,
loadedDataConverter: {
payloadConverter: [DefaultPayloadConverter],
failureConverter: [DefaultFailureConverter],
payloadCodecs: []
}
}
}
2023-02-28T09:49:36.018Z [INFO] asset workflow-bundle-a2ca801512c61451083d.js 22.2 MiB [emitted] [immutable] (name: main)
2023-02-28T09:49:36.018Z [INFO] runtime modules 1.25 KiB 6 modules
2023-02-28T09:49:36.018Z [INFO] modules by path ./packages/ 4.44 MiB 726 modules
2023-02-28T09:49:36.019Z [INFO] modules by path ./node_modules/ 3.54 MiB 293 modules
2023-02-28T09:49:36.019Z [INFO] modules by path ./tooling/tool-version-sync/lib/*.js 4.99 KiB
2023-02-28T09:49:36.019Z [INFO] ./tooling/tool-version-sync/lib/index.js 1.2 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] ./tooling/tool-version-sync/lib/toolVersions.js 1.54 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] ./tooling/tool-version-sync/lib/toolMemoryLimits.js 2.25 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] __temporal_custom_payload_converter (ignored) 15 bytes [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] __temporal_custom_failure_converter (ignored) 15 bytes [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] webpack 5.75.0 compiled successfully in 7586 ms
2023-02-28T09:49:36.116Z [INFO] Workflow bundle created { size: '22.12MB' }
2023-02-28T09:49:37.170023Z INFO temporal_sdk_core::worker: Initializing worker task_queue=test namespace=default
2023-02-28T09:49:37.172Z [INFO] Worker state changed { state: 'RUNNING' }
(node:454) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)
Too long with no output (exceeded 10m0s): context deadline exceeded
In this run it only runs 1 test file. In this test file we only create 1 timeSkipping test env, which we reuse for 12 tests.
I checked the memory and we only use at a max 25% Mem of the instance so OOM is out of the question.
The MaxListenersExceededWarning
can be ignored, sth in our code we need to resolve.
A successful run would have 3 min later the Worker state changed { state: 'STOPPING' } and etc. We only print info logs I will try to reproduce with debug logs as well.
I might have been on the wrong track, I today reran this 10 times on CircleCI but no failure. I know for sure this happens when we have multiple tests running, but there it could be OOM as well (so setting node options and/or going to a single Jest worker to see if it is resolved). Anyway, I am also further investigating to potentially pinpoint it better.
FWIW, we found that we were either running out CPU or Memory, when executing in CircleCI. The only thing that helped us is to reduce number of jest workers. we had not had a single failure since then.
I checked the memory and we only use at a max 25% Mem of the instance so OOM is out of the question.
Not sure if this is the issue here, but I personally recommend to always explicitly set Node's --max-old-space-size
(see this page for some details).
Based on your log, it appears you are executing inside a containerized environment with a memory constraint. Node itself doesn't play great in such cases, because it doesn't know about those constraints and instead configure its heap allocation and garbage collection to based on the machine's total memory.
The opposite is also possible: by default, Node will configure its heap allocation limit and garbage collection algorithms to 25% of available memory, up to a limit of 4GB (assuming node 14 to 18, with 64 bits cpu). That means that your Node process may not be taking advantage of all the resources available to it, which could explain that it is not performing as it should.
Both cases can be resolved by explicitly setting and properly tuning Node's --max-old-space-size
.
so any progress regarding this issue ? this not just a warning in our cicd its break the jest test and not passing them
We have resolved it by using a bigger machine and reducing the number of jest workers and setting --max-old-space-size
(taking into account the nr of workers).
I am now evaluating AVA tbh, because attaching the debugger is spotty with Jest.
I am not sure if this is relevant, but i will add here just to get others opinion as well,
When I run the test with createLocal
it works but when i use createTimeSkipping
test times out even after 60s. The example test is
import { DefaultLogger, LogEntry, Runtime, Worker } from '@temporalio/worker';
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { WorkflowCoverage } from '@temporalio/nyc-test-coverage';
import { example } from './workflow';
describe('Example workflow with mocks', () => {
let testEnv: TestWorkflowEnvironment;
const workflowCoverage = new WorkflowCoverage();
beforeAll(async () => {
// Use console.log instead of console.error to avoid red output
// Filter INFO log messages for clearer test output
Runtime.install({
logger: new DefaultLogger('WARN', (entry: LogEntry) => console.log(`[${entry.level}]`, entry.message)),
});
testEnv = await TestWorkflowEnvironment.createLocal();
});
afterAll(async () => {
await testEnv?.teardown();
});
afterAll(() => {
workflowCoverage.mergeIntoGlobalCoverage();
});
it('successfully completes the Workflow with a mocked Activity', async () => {
const { client, nativeConnection } = testEnv;
const taskQueue = 'test';
const worker = await Worker.create(
workflowCoverage.augmentWorkerOptions({
connection: nativeConnection,
taskQueue: 'test',
workflowsPath: require.resolve('../workflows'),
activities: {
greet: async () => 'Hello, Temporal!',
},
})
);
await worker.runUntil(async () => {
const result = await client.workflow.execute(example, {
args: ['Temporal'],
workflowId: 'test',
taskQueue,
});
expect(result).toEqual('Hello, Temporal!');
});
});
});
Why is this behaviour happening ?,
When I run the test with createLocal it works but when i use createTimeSkipping test times out even after 60s. The example test is
Thanks for the feedback.
We haven't had the capacity to investigate this further for now, we'll need more work to understand what's going on here.
We've found a workaround for this when running Worker
s using the TestWorkflowEnvironment
in jest with --detectOpenHandles
. This does not appear to work for activities run using the MockActivityEnvironment
, where the open handles still persist.
the summary of the worker workaround is:
- when you create your
TestWorkflowEnvironment
, store it on global, e.g.global.TemporalEnvironment: TestWorkflowEnvironment
- as you
Worker.create
each of your workers, store theWorker
(await the call to create) into something on global, e.g.global.workers: Worker[]
- as you
worker.run()
each of your workers, store the run promise as well into something on global, e.g.global.workerPromises: Promise[]
- setup a jest globalTeardown to clean up all of this stuff
- on each of the
global.workers
, call.shutdown()
, which will cause the workers to (eventually) shut down. do not try to await this - instead, wait for all of the worker run promises, e.g.
await Promise.all(global.workerPromises)
- finally, teardown the TWE, e.g.
await global.TemporalEnvironment.teardown()
e.g. teardown.ts
export default async function teardown() {
// ask each worker to start shutting down
global.workers.forEach((worker) => {
// do not wait for this, it is non-blocking
worker.shutdown();
});
// the promises created by `Worker.run` will resolve once the worker has actually shutdown
await Promise.all(global.workerPromises);
// fyi - teardown the environment won't work until after the workers disconnect
await global.TemporalEnvironment?.teardown();
}
To get your typing correct, you can declare global
, but note that you have to var
your variables, let
and const
wont work.
e.g.
declare global {
var TemporalEnvironment: TestWorkflowEnvironment;
var workers: Worker[];
var workerPromises: Promise[];
}
We have found that the most "sane" configuration of this is to use a jest globalSetup to start the TestWorkflowEnvironment
and all of the workers involved, because it more closely mirrors what will be happening in production (the server and workers will be running before a workflow payload arrives). But, you don't have to do that part.
In our worker repos, using Node 18 and temporalio/* @ 1.9.0, this closes all of our handles and lets jest exit gracefully.
We're still unable to work around the open handles from the MockActivityEnvironment
, though, which is disappointing.
It turns out that the handles opened by MockActivityEnvironment
are in the Runtime
. That means that you can
jest config
{
...,
"globalTeardown": "tests/teardown.ts"
}
teardown.ts
import { Runtime } from '@temporalio/worker';
export default async function() {
await Runtime.getInstance().shutdown();
}
and your open handles will get cleaned up. There is some delay (a few seconds) after the tests finish while the Runtime shuts down, but it stops jest from complaining! Hurray!
Thanks a lot @jbsil for sharing your findings on this issue. It will certainly help a few of our users.
I have to admit that I'm very intrigued by why this solution actually resolves the symptom, as I'm pretty sure that this will not force unloading of Neon's global ThreadSafeFunction
, but glad to know that it works. We're planning to dig into more into Jest related issues soon; I will experiment more with your workaround at that moment.
This is still ongoing issue, sometimes it also takes sometime before everything is terminated according to jest runner.
We are seeing this for MockActivityEnvironment
,
Jest has detected the following 1 open handle potentially keeping Jest from exiting:
● neon threadsafe function
34 |
35 | beforeEach(async () => {
> 36 | env = new MockActivityEnvironment({ attempt: 2 });
| ^
37 |
38 | const compiledSubject = await compileTemplate({
39 | mergeFields: allMergeFields,
at ../../../../node_modules/@temporalio/core-bridge/index.js:16:14
at Function.create (../../../../node_modules/@temporalio/worker/src/runtime.ts:202:31)
at Function.instance (../../../../node_modules/@temporalio/worker/src/runtime.ts:194:17)
at Activity.makeActivityLogger (../../../../node_modules/@temporalio/worker/src/activity.ts:82:34)
at new Activity (../../../../node_modules/@temporalio/worker/src/activity.ts:61:12)
at new MockActivityEnvironment (../../../../node_modules/@temporalio/testing/src/index.ts:416:21)
at Object.<anonymous> (src/lib/actions/emails/prepareEmailAndMessageData.spec.ts:36:11)