[fix][broker] Added check for invisible characters for subscription name
Motivation
When the subscription name contains some invisible characters, such as \u0000, the subscription cannot be deleted:

So when creating a subscription name, we should check the subscription name: create subscription group fails when subscription name contains invisible characters
Documentation
Check the box below or label this PR directly.
Need to update docs?
-
[ ]
doc-required(Your PR needs to update docs and you will update later) -
[ ]
doc-not-needed(Please explain why) -
[x]
doc(Your PR contains doc changes) -
[ ]
doc-complete(Docs have been already added)
@codelipenghui @Technoboy- PTAL,thanks!
/pulsarbot run-failure-checks
/pulsarbot run-failure-checks
/pulsarbot run-failure-checks
I think we can't resolve the issue("the subscription cannot be deleted") entirely by this approach. Because we can't unsubscribe existing subscriptions, also this approach can only affect REST API.
Moreover, I think the current restriction is too strict for me. For example, I want to use dots for subscription names. Could you modify this restriction to be able to enable/disable it?
% ./bin/pulsar-admin topics create-subscription -s api.sub persistent://public/default/topic
2022-08-24T18:00:31,752+0900 [AsyncHttpClient-7-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/topic/subscription/api.sub?replicated=false] Failed to perform http put request: javax.ws.rs.BadRequestException: HTTP 400 subscriptionName contains special characters
subscriptionName contains special characters
Reason: subscriptionName contains special characters
I think we can't resolve the issue("the subscription cannot be deleted") entirely by this approach. Because we can't unsubscribe existing subscriptions, also this approach can only affect REST API.
Moreover, I think the current restriction is too strict for me. For example, I want to use dots for subscription names. Could you modify this restriction to be able to enable/disable it?
% ./bin/pulsar-admin topics create-subscription -s api.sub persistent://public/default/topic 2022-08-24T18:00:31,752+0900 [AsyncHttpClient-7-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/topic/subscription/api.sub?replicated=false] Failed to perform http put request: javax.ws.rs.BadRequestException: HTTP 400 subscriptionName contains special characters subscriptionName contains special characters Reason: subscriptionName contains special characters
IMO,
- First, the subscription name should not contain special characters, and the specific naming rules can be discussed;
- For the existing illegal subscription name, it can be deleted by dynamically opening the subscription name expiration cleanup, but it cannot be deleted by command, because invisible characters (eg: /u0000) cannot be input by command;
@equanz @codelipenghui @HQebupt @Technoboy- PTAL,thanks!
/pulsarbot run-failure-checks
/pulsarbot run-failure-checks
@Jason918 PTAL,thanks!
/pulsarbot run-failure-checks
This is a breaking change and may prevent existing users from upgrading Pulsar. For example, using host names as subscriptions is a very common use case for our company. Host names commonly contain multiple . characters, but this PR makes it impossible to create subscriptions containing .. Therefore, this PR is unacceptable for us.
At least, a flag should be added to the broker's configuration to enable/disable this check. Moreover, the current check is too strict. For example, even if a subscription contains ., we can remove it without any problem. So there is no need to prohibit the creation of such subscriptions.
This is a breaking change and may prevent existing users from upgrading Pulsar. For example, using host names as subscriptions is a very common use case for our company. Host names commonly contain multiple
.characters, but this PR makes it impossible to create subscriptions containing.. Therefore, this PR is unacceptable for us.At least, a flag should be added to the broker's configuration to enable/disable this check. Moreover, the current check is too strict. For example, even if a subscription contains
., we can remove it without any problem. So there is no need to prohibit the creation of such subscriptions.
OK, I agree to only check for invisible characters
/pulsarbot run-failure-checks
/pulsarbot run-failure-checks
@eolivelli @codelipenghui @hangc0276 @Technoboy- PTAL,thanks!
Can you add a test for the admin API ?
Fixed. PTAL,thanks! @poorbarcode
/pulsarbot run-failure-checks
@Technoboy- PTAL,thanks!
I think the change makes sense. Is there a reason we don't do this validation for subscriptions created via auto subscription creation in the Pulsar Protocol?
We should add the validation. It's better to keep consistent with the REST API and Pulsar Protocol.
I think the change makes sense. Is there a reason we don't do this validation for subscriptions created via auto subscription creation in the Pulsar Protocol?
We should add the validation. It's better to keep consistent with the REST API and Pulsar Protocol.
OK, I will add this validation for subscriptions created via auto subscription creation in the Pulsar Protocol.
@codelipenghui @michaeljmarshall
I think the change makes sense. Is there a reason we don't do this validation for subscriptions created via auto subscription creation in the Pulsar Protocol?
Fixed. Add this validation when automatically creating a subscription. @michaeljmarshall
I think that
containsInvisibleCharactersis too limited. There might be other character classes that aren't suitable, so calling them "invisible" would be wrong.Instead of checking for invalid characters, it would be better to have the check the validity of the given subscription name explicitly.
Pulsar already contains org.apache.pulsar.common.naming.NamedEntity#checkName method and a similar solution could be followed in this case. I guess switching to use NamedEntity.checkName could be a breaking change for existing applications. Adding a specific solution for Subscription name validation would be justified.
It should also be discussed whether there's a need for a feature flag for enforcing a stricter validation of the subscription name with NamedEntity.checkName so that there would be consistency across other named entitiies in Pulsar.
@lhotari
- Use org.apache.pulsar.common.naming.NamedEntity#checkName instead of containsInvisibleCharacters?
- Add a configuration flag to control whether to verify the subscription name?
/pulsarbot run-failure-checks
CI PR: https://github.com/lordcheng10/pulsar/pull/3
/pulsarbot run-failure-checks
Is the root of the problem here that the character sequences in question are not being properly encoded when forming the admin URLs? From the example given, I am seeing what looks to be a URL encoding of a Java[Script] escaped form of the original name, rather than a direct encoding of the original name. There are also some pretty odd character codes in the formed URL — the backslash of \u.... seems to have been encoded as %255C:
Practically speaking, it should be possible to correctly transport any characters (including invisible ones) via a correctly encoded URL. I suspect there is a character encoding issue somewhere. If this is the case, I'd recommend:
- Preventing new names from being created with invisible/indistinguishable characters
- Fix name encoding / URL handling so that any existing names with such characters can still be managed
Is the root of the problem here that the character sequences in question are not being properly encoded when forming the admin URLs? From the example given, I am seeing what looks to be a URL encoding of a Java[Script] escaped form of the original name, rather than a direct encoding of the original name. There are also some pretty odd character codes in the formed URL — the backslash of
\u....seems to have been encoded as%255C:Practically speaking, it should be possible to correctly transport any characters (including invisible ones) via a correctly encoded URL. I suspect there is a character encoding issue somewhere. If this is the case, I'd recommend:
- Preventing new names from being created with invisible/indistinguishable characters
- Fix name encoding / URL handling so that any existing names with such characters can still be managed
- There is no problem with encoding;
- The problem is that the subscription name contains invisible characters, and invisible characters cannot be entered on the command line, resulting in the command reporting an error: Subscription not found
Instead of checking for invalid characters, it would be better to have the check the validity of the given subscription name explicitly.
I'll start a discussion about naming conventions for subscriptions. @lhotari
Instead of checking for invalid characters, it would be better to have the check the validity of the given subscription name explicitly.
I'll start a discussion about naming conventions for subscriptions. @lhotari
discuss thread: https://lists.apache.org/thread/mbm8p3wh7k0bc3nxg3qq6tdbqfotckr5
CI all passed: https://github.com/lordcheng10/pulsar/pull/3