ballerina-lang icon indicating copy to clipboard operation
ballerina-lang copied to clipboard

Need an API to Unescape Characters in Identifiers

Open ThisaruGuruge opened this issue 2 years ago • 12 comments

Description: When the Semantic API returns values of identifiers, the escape characters are not unescaped. For example:

import ballerina/graphql;

service on new graphql:Listener(4000) {
    resource function get name(string 'version) returns string {
        return 'version;
    }
}

In the above code, the parameter name of the resource method is returned as 'version from the getName() API. There should be a way to remove the leading single quote and also unescape the escaped characters.

There is an existing API for this, but it is not exposed to external users. It is better to expose this, as some compiler plugins need this functionality.

ThisaruGuruge avatar Jul 18 '22 11:07 ThisaruGuruge

@dulajdilshan is getName() returning 'version, the expected behavior? Don't we expose both the original value and the unescaped value? if we expose unescaped value don't we remove the quote?

lochana-chathura avatar Jul 18 '22 11:07 lochana-chathura

+1 for adding a public API for this. Currently there's no way to compare identifiers when it's a mix of escaped and unescaped identifiers.

pubudu91 avatar Oct 12 '22 10:10 pubudu91

@dulajdilshan is getName() returning 'version, the expected behavior? Don't we expose both the original value and the unescaped value? if we expose unescaped value don't we remove the quote?

This is the expected behaviour. The behaviour in the Semantic API is that we return the names exactly as they were written in the source code. e.g., if the user wrote 'name in the code, getName() will also return 'name.

pubudu91 avatar Oct 12 '22 10:10 pubudu91

I remember keeping both values internally. Was wondering if both of them were exposed via separete methods.

lochana-chathura avatar Oct 12 '22 10:10 lochana-chathura

I remember keeping both values internally. Was wondering if both of them were exposed via separete methods.

Yeah. Internally we keep both of the values. But we only expose the name that the user has defined. I'm also +1 to have another API to get the unescaped string value of the name

dulajdilshan avatar Oct 12 '22 10:10 dulajdilshan

There are different variations when it comes to escaping and unescaping, We can't expose all these variations as part of the syntax or semantic API. We have this identifier-util which is used by many other components including runtime. I guess the best option is to expose this Util as a proper API.

hasithaa avatar Oct 17 '22 03:10 hasithaa

I'm not sure to which team this change belongs. Most of the methods are written by the runtime team.

We have this identifier-util which is used by many other components including runtime. I guess the best option is to expose this Util as a proper API.

looping @HindujaB @warunalakshitha. would we be able to expose this as a proper API?

lochana-chathura avatar Oct 17 '22 09:10 lochana-chathura

I'm not sure to which team this change belongs. Most of the methods are written by the runtime team.

In that case, will the compiler plugins be able to use them in the compiler plugins?

ThisaruGuruge avatar Oct 17 '22 09:10 ThisaruGuruge

In that case, will the compiler plugins be able to use them in the compiler plugins?

Yes of course. New API should be visible everywhere.

Currently, it is exported as follows.

exports io.ballerina.identifier to io.ballerina.lang, io.ballerina.runtime, io.ballerina.shell,
            io.ballerina.testerina.runtime, io.ballerina.lang.runtime, io.ballerina.lang.error,
            ballerina.debug.adapter.core, io.ballerina.jsonmapper;

lochana-chathura avatar Oct 17 '22 09:10 lochana-chathura

I also noticed that most of the compiler APIs return the escaped value, and some are not. For example,

PathSegment pathSegment = // get path segment
String signature = pathSegment.signature();

In the above code, the signature will return the value without the ' character. The typeSymbol.getName().get() also returns the value without the ' character. But the input parameters do not behave this way.

ThisaruGuruge avatar Oct 17 '22 09:10 ThisaruGuruge

In the above code, the signature will return the value without the ' character. The typeSymbol.getName().get() also returns the value without the ' character. But the input parameters do not behave this way.

This seems to be a bug. @dulajdilshan thoughts?

lochana-chathura avatar Oct 17 '22 09:10 lochana-chathura

New API should be visible everywhere.

The identifier-util module contains the APIs that are for internal usages - like jvm encoding etc. So, it is only exported to specific modules. In this use-case, we will need to export only the escape/unescape APIs to public. The others should be restricted.

HindujaB avatar Oct 17 '22 13:10 HindujaB

Yes. We should not expose internal encoding/decoding methods as apis. Lets expose unescape method only in Identifier-utils module as a separate api package method.

warunalakshitha avatar Oct 19 '22 19:10 warunalakshitha

Yes. We should not expose internal encoding/decoding methods as apis. Lets expose unescape method only in Identifier-utils module as a separate api package method.

Need a method for escaping as well. That's needed for cases where we generate Ballerina code.

pubudu91 avatar Oct 20 '22 05:10 pubudu91

In the above code, the signature will return the value without the ' character. The typeSymbol.getName().get() also returns the value without the ' character. But the input parameters do not behave this way.

This seems to be a bug. @dulajdilshan thoughts?

Any thoughts on this? We are using these APIs in our packages. What is the bug here? Including the ' character or not including it? Is this fixed now? We are using these APIs in compiler plugins. If the bug is fixed, how should we update our code?

Also, when we do introduce new APIs like this, can we please at least include some examples in the PR? We've mentioned this previously as well. It is really hard to find APIs and their usage without any proper documentation.

ThisaruGuruge avatar Nov 10 '22 03:11 ThisaruGuruge

Any thoughts on this? We are using these APIs in our packages. What is the bug here? Including the ' character or not including it? Is this fixed now? We are using these APIs in compiler plugins. If the bug is fixed, how should we update our code?

@dulajdilshan this is regarding the semantic API behavior. Shall we create a separate issue to check this and fix it?

Also, when we do introduce new APIs like this, can we please at least include some examples in the PR? We've mentioned this previously as well. It is really hard to find APIs and their usage without any proper documentation.

+1

lochana-chathura avatar Nov 10 '22 07:11 lochana-chathura

Any thoughts on this? We are using these APIs in our packages. What is the bug here? Including the ' character or not including it? Is this fixed now? We are using these APIs in compiler plugins. If the bug is fixed, how should we update our code? We are working on it right now.

Also, when we do introduce new APIs like this, can we please at least include some examples in the PR? We've mentioned this previously as well. It is really hard to find APIs and their usage without any proper documentation.

Sure, we will include examples in future PRs

dulajdilshan avatar Nov 10 '22 08:11 dulajdilshan

Any thoughts on this? We are using these APIs in our packages. What is the bug here? Including the ' character or not including it? Is this fixed now? We are using these APIs in compiler plugins. If the bug is fixed, how should we update our code?

@dulajdilshan this is regarding the semantic API behavior. Shall we create a separate issue to check this and fix it?

Also, when we do introduce new APIs like this, can we please at least include some examples in the PR? We've mentioned this previously as well. It is really hard to find APIs and their usage without any proper documentation.

+1

Sure I'll create a one

dulajdilshan avatar Nov 10 '22 08:11 dulajdilshan