pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix][broker] Added check for invisible characters for subscription name

Open lordcheng10 opened this issue 3 years ago • 25 comments

Motivation

When the subscription name contains some invisible characters, such as \u0000, the subscription cannot be deleted: image

image

So when creating a subscription name, we should check the subscription name: create subscription group fails when subscription name contains invisible characters

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • [ ] doc-required (Your PR needs to update docs and you will update later)

  • [ ] doc-not-needed (Please explain why)

  • [x] doc (Your PR contains doc changes)

  • [ ] doc-complete (Docs have been already added)

lordcheng10 avatar Aug 18 '22 02:08 lordcheng10

@codelipenghui @Technoboy- PTAL,thanks!

lordcheng10 avatar Aug 18 '22 02:08 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Aug 18 '22 04:08 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Aug 18 '22 06:08 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Aug 18 '22 15:08 lordcheng10

I think we can't resolve the issue("the subscription cannot be deleted") entirely by this approach. Because we can't unsubscribe existing subscriptions, also this approach can only affect REST API.

Moreover, I think the current restriction is too strict for me. For example, I want to use dots for subscription names. Could you modify this restriction to be able to enable/disable it?

% ./bin/pulsar-admin topics create-subscription -s api.sub persistent://public/default/topic
2022-08-24T18:00:31,752+0900 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/topic/subscription/api.sub?replicated=false] Failed to perform http put request: javax.ws.rs.BadRequestException: HTTP 400 subscriptionName contains special characters
subscriptionName contains special characters

Reason: subscriptionName contains special characters

equanz avatar Aug 24 '22 09:08 equanz

I think we can't resolve the issue("the subscription cannot be deleted") entirely by this approach. Because we can't unsubscribe existing subscriptions, also this approach can only affect REST API.

Moreover, I think the current restriction is too strict for me. For example, I want to use dots for subscription names. Could you modify this restriction to be able to enable/disable it?

% ./bin/pulsar-admin topics create-subscription -s api.sub persistent://public/default/topic
2022-08-24T18:00:31,752+0900 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/topic/subscription/api.sub?replicated=false] Failed to perform http put request: javax.ws.rs.BadRequestException: HTTP 400 subscriptionName contains special characters
subscriptionName contains special characters

Reason: subscriptionName contains special characters

IMO,

  1. First, the subscription name should not contain special characters, and the specific naming rules can be discussed;
  2. For the existing illegal subscription name, it can be deleted by dynamically opening the subscription name expiration cleanup, but it cannot be deleted by command, because invisible characters (eg: /u0000) cannot be input by command;

@equanz @codelipenghui @HQebupt @Technoboy- PTAL,thanks!

lordcheng10 avatar Aug 26 '22 03:08 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Aug 26 '22 06:08 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 01 '22 10:09 lordcheng10

@Jason918 PTAL,thanks!

lordcheng10 avatar Sep 01 '22 11:09 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 02 '22 04:09 lordcheng10

This is a breaking change and may prevent existing users from upgrading Pulsar. For example, using host names as subscriptions is a very common use case for our company. Host names commonly contain multiple . characters, but this PR makes it impossible to create subscriptions containing .. Therefore, this PR is unacceptable for us.

At least, a flag should be added to the broker's configuration to enable/disable this check. Moreover, the current check is too strict. For example, even if a subscription contains ., we can remove it without any problem. So there is no need to prohibit the creation of such subscriptions.

massakam avatar Sep 02 '22 06:09 massakam

This is a breaking change and may prevent existing users from upgrading Pulsar. For example, using host names as subscriptions is a very common use case for our company. Host names commonly contain multiple . characters, but this PR makes it impossible to create subscriptions containing .. Therefore, this PR is unacceptable for us.

At least, a flag should be added to the broker's configuration to enable/disable this check. Moreover, the current check is too strict. For example, even if a subscription contains ., we can remove it without any problem. So there is no need to prohibit the creation of such subscriptions.

OK, I agree to only check for invisible characters

lordcheng10 avatar Sep 05 '22 11:09 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 15 '22 09:09 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 16 '22 04:09 lordcheng10

@eolivelli @codelipenghui @hangc0276 @Technoboy- PTAL,thanks!

lordcheng10 avatar Sep 16 '22 07:09 lordcheng10

Can you add a test for the admin API ?

Fixed. PTAL,thanks! @poorbarcode

lordcheng10 avatar Sep 16 '22 08:09 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 16 '22 15:09 lordcheng10

@Technoboy- PTAL,thanks!

lordcheng10 avatar Sep 17 '22 02:09 lordcheng10

I think the change makes sense. Is there a reason we don't do this validation for subscriptions created via auto subscription creation in the Pulsar Protocol?

We should add the validation. It's better to keep consistent with the REST API and Pulsar Protocol.

codelipenghui avatar Sep 19 '22 01:09 codelipenghui

I think the change makes sense. Is there a reason we don't do this validation for subscriptions created via auto subscription creation in the Pulsar Protocol?

We should add the validation. It's better to keep consistent with the REST API and Pulsar Protocol.

OK, I will add this validation for subscriptions created via auto subscription creation in the Pulsar Protocol.

@codelipenghui @michaeljmarshall

lordcheng10 avatar Sep 19 '22 03:09 lordcheng10

I think the change makes sense. Is there a reason we don't do this validation for subscriptions created via auto subscription creation in the Pulsar Protocol?

Fixed. Add this validation when automatically creating a subscription. @michaeljmarshall

lordcheng10 avatar Sep 19 '22 05:09 lordcheng10

I think that containsInvisibleCharacters is too limited. There might be other character classes that aren't suitable, so calling them "invisible" would be wrong.

Instead of checking for invalid characters, it would be better to have the check the validity of the given subscription name explicitly.

Pulsar already contains org.apache.pulsar.common.naming.NamedEntity#checkName method and a similar solution could be followed in this case. I guess switching to use NamedEntity.checkName could be a breaking change for existing applications. Adding a specific solution for Subscription name validation would be justified.

It should also be discussed whether there's a need for a feature flag for enforcing a stricter validation of the subscription name with NamedEntity.checkName so that there would be consistency across other named entitiies in Pulsar.

@lhotari

  1. Use org.apache.pulsar.common.naming.NamedEntity#checkName instead of containsInvisibleCharacters?
  2. Add a configuration flag to control whether to verify the subscription name?

lordcheng10 avatar Sep 19 '22 08:09 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 19 '22 13:09 lordcheng10

CI PR: https://github.com/lordcheng10/pulsar/pull/3

lordcheng10 avatar Sep 20 '22 03:09 lordcheng10

/pulsarbot run-failure-checks

lordcheng10 avatar Sep 20 '22 06:09 lordcheng10

Is the root of the problem here that the character sequences in question are not being properly encoded when forming the admin URLs? From the example given, I am seeing what looks to be a URL encoding of a Java[Script] escaped form of the original name, rather than a direct encoding of the original name. There are also some pretty odd character codes in the formed URL — the backslash of \u.... seems to have been encoded as %255C:

Practically speaking, it should be possible to correctly transport any characters (including invisible ones) via a correctly encoded URL. I suspect there is a character encoding issue somewhere. If this is the case, I'd recommend:

  1. Preventing new names from being created with invisible/indistinguishable characters
  2. Fix name encoding / URL handling so that any existing names with such characters can still be managed

teabot avatar Sep 30 '22 13:09 teabot

Is the root of the problem here that the character sequences in question are not being properly encoded when forming the admin URLs? From the example given, I am seeing what looks to be a URL encoding of a Java[Script] escaped form of the original name, rather than a direct encoding of the original name. There are also some pretty odd character codes in the formed URL — the backslash of \u.... seems to have been encoded as %255C:

Practically speaking, it should be possible to correctly transport any characters (including invisible ones) via a correctly encoded URL. I suspect there is a character encoding issue somewhere. If this is the case, I'd recommend:

  1. Preventing new names from being created with invisible/indistinguishable characters
  2. Fix name encoding / URL handling so that any existing names with such characters can still be managed
  1. There is no problem with encoding;
  2. The problem is that the subscription name contains invisible characters, and invisible characters cannot be entered on the command line, resulting in the command reporting an error: Subscription not found

lordcheng10 avatar Oct 01 '22 02:10 lordcheng10

Instead of checking for invalid characters, it would be better to have the check the validity of the given subscription name explicitly.

I'll start a discussion about naming conventions for subscriptions. @lhotari

lordcheng10 avatar Oct 01 '22 02:10 lordcheng10

Instead of checking for invalid characters, it would be better to have the check the validity of the given subscription name explicitly.

I'll start a discussion about naming conventions for subscriptions. @lhotari

discuss thread: https://lists.apache.org/thread/mbm8p3wh7k0bc3nxg3qq6tdbqfotckr5

lordcheng10 avatar Oct 01 '22 03:10 lordcheng10

CI all passed: https://github.com/lordcheng10/pulsar/pull/3

lordcheng10 avatar Oct 01 '22 03:10 lordcheng10