pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[BUG] Custom Java Sink Connector: Logging statement in 'close' method not executed on deletion or stop

Open NurramoX opened this issue 1 year ago • 4 comments

When adding a logging statement inside the "close" method of a custom sink connector, uploading it to a standalone cluster, and then deleting or stopping the connector, the logging statement is not executed as expected.

Version

OS: Arch Linux x86_64 Kernel: 6.6.40-1-lts Java:

  • openjdk 17.0.12 2024-07-16
  • OpenJDK Runtime Environment (build 17.0.12+7)
  • OpenJDK 64-Bit Server VM (build 17.0.12+7, mixed mode, sharing)

Apache Pulsar: 3.3.0 (using apachepulsar/pulsar:3.3.0 Docker image)

Java Client Libraries:

  • org.apache.pulsar:pulsar-client-original:3.3.0
  • org.apache.pulsar:pulsar-client-admin-original:3.3.0

Minimal reproduce step

Steps to reproduce:

  1. Set up a standalone Apache Pulsar cluster using Docker Compose. Docker Compose configuration:
services:
  pulsar:
    container_name: pulsar
    image: apachepulsar/pulsar:3.3.0
    ports:
      - "6650:6650"
      - "8080:8080"
      - "8081:8081"
    environment:
      - PULSAR_STANDALONE_USE_ZOOKEEPER=1
    command: ["bin/pulsar", "standalone"]
  1. Implement a custom sink connector with logging statements in "open" and "close" methods. Sink Connector:
public class SimpleSink implements Sink<String> {
    private Logger LOG = null;
    @Override
    public void open(Map<String, Object> config, SinkContext sinkContext) {
        LOG = sinkContext.getLogger();
        LOG.info("Hello world");
    }
    @Override
    public void write(Record<String> record) {
        String value = record.getValue();
        LOG.info("Recorded {}.", value);
        record.ack();
    }
    @Override
    public void close() {
        LOG.info("Bye world");
    }
}
  1. Upload the sink connector to the Pulsar cluster. Sink Connector Configuration:
tenant: "simple-tenant"
namespace: "simple-ns"
name: "DataSink"
className: "dev.nurramo.sink.SimpleSink"
inputs: ["persistent://simple-tenant/simple-ns/output"]
sourceSubscriptionName: "sub-1"

  1. Stop or delete the connector
  2. Check logs for the expected logging statements

What did you expect to see?

The logging statements in both the "open" and "close" methods should be executed and visible in the logs when the connector is started and then stopped or deleted.

What did you see instead?

Only the logging statement from the "open" method is visible in the logs. The "close" method's logging statement is not executed or logged when the connector is stopped or deleted.

Here is how I confirmed that. :

/pulsar/logs/functions/simple-tenant/simple-ns/DataSink $ cat *.log | grep world
2024-07-22T21:42:01,615+0000 [simple-tenant/simple-ns/DataSink-0] INFO  function-DataSink - Hello world

"Bye world" was not listed anywhere

Anything else?

I haven't encountered any issues running the sink via the Localrunner. Both "Hello world" and "Bye world" pop up in the log output.

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

Search before asking

  • [X] I searched in the issues and found nothing similar.

Read release policy

  • [X] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

NurramoX avatar Jul 22 '24 22:07 NurramoX

I did a quick check and didn't find an obvious reason. I guess one possibility is that the logger doesn't get flushed after the log message gets written or the logger is shutdown before that. That's a problem that has been fixed in some other Pulsar components. For example, there's #20689 which handled proper shutdown of the logger in case of exceptions in the startup sequence.

lhotari avatar Aug 09 '24 12:08 lhotari

I think I found the problem. The logging in shutdown was fixed in #10304 for the broker. It's necessary to add -Dlog4j.shutdownHookEnabled=false system property and manage the shutdown of the log4j LogManager.

lhotari avatar Aug 09 '24 12:08 lhotari

I believe this option was already set when I encountered the issue

When Pulsar is run in standalone mode, -Dlog4j.shutdownHookEnabled=false is automatically set because the Pulsar shell script starts it with the $LOG4J2_SHUTDOWN_HOOK_DISABLED variable already configured.

NurramoX avatar Aug 27 '24 13:08 NurramoX

I believe this option was already set when I encountered the issue

When Pulsar is run in standalone mode, -Dlog4j.shutdownHookEnabled=false is automatically set because the Pulsar shell script starts it with the $LOG4J2_SHUTDOWN_HOOK_DISABLED variable already configured.

@NurramoX It's also necessary to manage LogManager shutdown. Please check the details in the previous comment. This requires a code change in Pulsar Functions library code.

lhotari avatar Aug 27 '24 15:08 lhotari

It's also necessary to manage LogManager shutdown. Please check the details in the previous comment. This requires a code change in Pulsar Functions library code.

To clarify, are you suggesting that we need to make changes somewhere in this library to manually manage the shutdown of the log4j LogManager, similar to what was done for the broker in PR #10304?

NurramoX avatar Aug 29 '24 14:08 NurramoX

It's also necessary to manage LogManager shutdown. Please check the details in the previous comment. This requires a code change in Pulsar Functions library code.

To clarify, are you suggesting that we need to make changes somewhere in this library to manually manage the shutdown of the log4j LogManager, similar to what was done for the broker in PR #10304?

Yes, that's my current assumption.

lhotari avatar Aug 29 '24 18:08 lhotari