OrchardCoreContrib.PoExtractor Remove some Limitations / Pitfalls

Hi There

I have a proposal to improve the PoExtractor and would like to get some feedback from you whether you the proposed solution is agreeable with the project and whether you would accept a PR.

Motivation

My team discussed using this tool and we anticipate the limitations regarding name convention and msgctxt to lead to some friction:

the fact that the member name (T, S, H) is used to identity usage of IStringLocalizer
- it's brittle. Spelling it differently means localization just stops working, without any errors.
- our coding style says field members need to start with a lower case character. We'd have a choice of:
  - ignoring "warnings" in the IDE --> this lessens the values of IDE warnings overall
  - each and every time add a // ReSharper disable once InconsistentNaming required directive --> cumbersome.
deriving the msgctxt from the class the IStringLocalizer is used in creates a few scenarios where the PoExtractor uses a different msgctxt than OrchardCore.Localization will use. These can happen by accident. The code will still compile, and we'll only find out once someone reports localizations as missing - creating friction in the process. This can happen accidentally, for example, when you move some code using IStringLocalizer<T> to another class.
- theoretically we could write an Analyzer ourselves to ensure code is conformant with the conventions used in the PoExtractor. However, that would duplicate some of what PoExtractor is doing - so not better to maintain. Furthermore, if we can write an analyzer which can judge conformity (correct name for IStringLocalizer<T>, T is correctly chosen), we can most likely also just remove the limitations with similar code. And we think that's worth more.

Solution Idea

What's missing in the current implementation is, instead of just reading the AST, also resolving types. From some quick research it seems to me that for this we need a roslyn SemanticModel besides the SyntaxTree. And the SemanticModel requires a Compilation.

I've done a very dirty proof of concept: See here and here

The result of this is, that the field/property the IStringLocalizer<T> is assigned to can have any name. The part of generating a different msgctxt depending on the type parameter T is not addressed, but from what I see can be done with the information already made available in this change.

I'm not really experienced with Roslyn, so there's a few questions-marks / risks:

maybe there's a better way to achieve the same
haven't tested how it works when the code doesn't compile
haven't tested whether/how it works when the source references another version of the Microsoft.Extensions.Localization, which defines the IStringLocalizer<T> interface
maybe a Compilation doesn't really mean compiling it... I just don't understand this well enough, yet ;-)

Having said that, I think it's not unlikely that it would only work on compiling (error-free) code (warnings would probably be fine). If that limitation is fine, maybe the PoExtractor could be simplified by removing explicit support for razor (and even liquid?) templates by opening + building the whole project. My expectation here is that this would deal with generating the classes from .razor files automatically and the tool could then work on any class and not care for whether it's razor/liquid or whatnot. I would, however, expect this to slow down processing time.

For us, both would be acceptable.

Alternative Solution

Instead of using Roslyn, and analyze source code, one could also analyze the IL with Mono.Cecil. From past experience, I would say, for the problem at hand, Mono.Cecil is simpler to work with than SyntaxTree + SemanticModel but:

only ever works on compiled code. The current solution does not need a compiled result, it doesn't even need code without errors / compiling code.
Cecil development seems to be on the backburner, and I'm not sure about continued support/development
don't know how well this integrates with AoT / native compilation
consider me biased because I have experience with Mono.CeCil but almost none with Roslyn

Apr 22 '24 14:04 BrunoJuchli

I appreciate your input @BrunoJuchli, all the feedback is welcome:

Regarding localization field names, I remember that we suggested in the past to make optional, the user can provide his/her names

deriving the msgctxt from the class the IStringLocalizer is used in creates a few scenarios where the PoExtractor uses a different msgctxt than OrchardCore.Localization will use.

Could you please elaborate?

Instead of using Roslyn, and analyze source code, one could also analyze the IL with Mono.Cecil.

I don't think it's something that can we go for unless there are some good reasons - regardless of your experience in Cecil :) -

I'm glad to collaborate with you guys to make the PO Extractor works as expected with your project(s)

Apr 22 '24 20:04 hishamco

I suggest that you write a failing test so that we can start to fix the issues that you are facing

Apr 22 '24 20:04 hishamco

I appreciate your input @BrunoJuchli, all the feedback is welcome:

I appreciate your quick feedback and willingness to hear what I have to say :-) Thanks!

Regarding localization field names, I remember that we suggested in the past to make optional, the user can provide his/her names

This can address the problem of the convention being in violation of the "users" coding style guide.

However, it does not address the problem where these are accidentally misnamed. These cases are likely to fall through safety nets and make it to production. In our case, our release cycle works with late-localization. That is, we first deploy a release where not all texts are localized. Localization then happens some time after (few days). That is, it's not unexpected to see "developer texts". On the translation side, we have tooling support to tell translators that there's translations missing. Of course, it doesn't pick up what isn't extracted to the PO, so probably no-one will notice these untranslated texts for quite a while. The result is:

degradation in the quality of our software. Instead of taking days, it could take months.
customers need to reach out to us to inform us about a bug. This costs the customer, our support team, and ultimately binds development resources. All involved parties of course have better use for their time ;-)

Therefore my goal is to get rid of the conventions in the PO Extraction tool (you can still use them in your SW, it doesn't matter). Instead, it should follow the rule: If it compiles, extraction works.

deriving the msgctxt from the class the IStringLocalizer is used in creates a few scenarios where the PoExtractor uses a different msgctxt than OrchardCore.Localization will use.

Could you please elaborate?

namespace ExampleApp;

public class Foo(
    IStringLocalizer<Bar> S)
{
      public void DoSomething()
      {
           Write(
                S["this is some translatable text"];
      }
}

The po extractor will create:

msgctxt "ExampleApp.Foo"
msgid "this is some translatable text"
msgstr ""

And OrchardCore.Localization will search for:

msgctxt "ExampleApp.Bar" msgid "this is some translatable text"

Note: I originally wrote "a few scenarios". For code-examples, it's just this one. I was thinking in scenarios how you could accidentally get to this state of violating the convention and expressed myself poorly. But here's how you can accidentally achieve this scenario:

copy + paste or move the IStringLocalizer<T> definition to another class.
duplicate a class, but forget to adapt the T of IStringLocalizer<T>.
(automated) extract class refactoring

Instead of using Roslyn, and analyze source code, one could also analyze the IL with Mono.Cecil.

I don't think it's something that can we go for unless there are some good reasons - regardless of your experience in Cecil :) -

I understand and concur with your assessment.

I suggest that you write a failing test so that we can start to fix the issues that you are facing

I'll do that.

If you've got some idea/knowledge about the compilation / SemanticModel aspect I'd appreciate it very much if you could give me some feedback whether you think it would be acceptable to use it.

Apr 24 '24 14:04 BrunoJuchli

In your example above, there's an issue that we are facing frequently, the generic type SHOULD be the same as the class name, we suggested creating an analyzer instead

Apr 25 '24 05:04 hishamco

@sebastienros @Skrypt your thoughts

Apr 25 '24 05:04 hishamco

In your example above, there's an issue that we are facing frequently, the generic type SHOULD be the same as the class name, we suggested creating an analyzer instead

Writing an analyzer is more work than removing these conventions from PO extractor.

The reason for this is, that the analyzer would use the same tools (roslyn) that PO extractor is already using. It requires the same code to determine whether the correct type is used as the PO extractor would use to get rid of the convention.

From a usability perspective, I would also argue that the analyzer is worse.

Do you agree?

Apr 25 '24 05:04 BrunoJuchli

I'm writing an analyzer in the client app that uses this library.

I think we could fix the generated localization resource here but the issue no one will tell the app the types should match, right?

Apr 25 '24 05:04 hishamco

I'm writing an analyzer in the client app that uses this library.

I think we could fix the generated localization resource here but the issue no one will tell the app the types should match, right?

When the PoExtractor is changed, why should the types match? It'll be up to consumer to decide whether they should match. Each developer/team can decide for their own conventions (or lack thereof). If they decide for a convention, then an analyzer would be useful --> so I see why offering an Analyzer makes sense.

Is there a reason why the convention would need to be followed by everyone using Orchard? And the PoExtractor?

(In my case, I would certainly prefer it to be opt-in, but I'm not using orchard itself).

Apr 25 '24 06:04 BrunoJuchli

Even if you are not using OC if you check ASP.NET Core samples, you will notice generic types for the logger, string localizer .. etc should match the class, so this is not OC specific. I agree for naming the localizer fields they should be framework agnostic

Apr 25 '24 06:04 hishamco

Even if you are not using OC if you check ASP.NET Core samples, you will notice generic types for the logger, string localizer .. etc should match the class, so this is not OC specific. I agree for naming the localizer fields they should be framework agnostic

Ah yes. I blame microsoft for establishing this practice 🙄 - a pity. I do understand that as component authors it makes sense to follow "established practice". Even though arguably there exists better practice, and so I would say it would be nice to support both ;-)

Instead of using a type parameter, pass the info into the ctor, say like:

public class Logger : ILogger
{
     public Logger(
         Type contextType)
     {
     ...
     }
}

And when injecting the ILogger into FooService, the DI container would pass typeof(FooService) as the type parameter to the Logger implementation. There's plenty of DI containers which are capable of doing so. For some examples, see here. That's less typing for developers, no chance for mistakes, and it doesn't require writing an analyzer... ;-)

What do you think about supporting both patterns? The PoExtractor could also emit an error if the type parameter T of IStringLocalizer<T> doesn't match the type of the class it's used in.

Apr 25 '24 06:04 BrunoJuchli

Ah yes. I blame microsoft for establishing this practice 🙄 - a pity.

lol, could you please elaborate on why you are using a different type in the generic type?

The PoExtractor could also emit an error if the type parameter T of IStringLocalizer<T> doesn't match the type of the class it's used in.

Could be, but I think this is not a tool responsibility, right?

Apr 25 '24 08:04 hishamco

Ah yes. I blame microsoft for establishing this practice 🙄 - a pity.

lol, could you please elaborate on why you are using a different type in the generic type?

Because I make mistakes, and for our localization tooling it doesn't matter much ;-)

But, after our discussion, I understand see that there's no value in deliberately choosing a different T. And therefore, the analyzer makes sense, because it's better to fail fast in the IDE than later when executing extractpo. So thanks for being that patient with me ;-)

(Like I said, it would be even nicer if the type would be left out instead, and would be deduced reliably per convention (so, basically, what PoExtractor is doing, but not what OrchardCore.Localization is doing)).

The PoExtractor could also emit an error if the type parameter T of IStringLocalizer doesn't match the type of the class it's used in.

Could be, but I think this is not a tool responsibility, right?

I think this depends on intended usage of the tool. If you pass it invalid input, the tool should give back an error. That is:

if extractpo respect the T argument of IStringLocalizer<T> (that is, deduce the msgctxt from it), then it should not warn in case T doesn't equal the class IStringLocalizer<T> is used in.
if extractpo deduces msgtxt from the class IStringLocalizer<T> is used in, then using IStringLocalizer<Bar> inside of Foo is invalid data. There is no point in creating POT entries for these, instead a warning/error should be generated.

Vision

So, to cover all cases optimally, I think it should work as follows:

Analyzer

Searches for usages of IStringLocalizer<T>. When the T doesn't match the type it's used in (for example, using IStringLocalizer<Bar> inside of Foo) the analyzer generates an error.

Optionally, it could also offer a fix-up, which would be to change the T from Bar to Foo.

ExtractPo

Generates POT files from usages of IStringLocalizer[...] and IStringLocalizer<T>[...]. The name of fields and properties the IStringLocalizer/IStringLocalizer<T> is assigned to does not matter.

The following rules would be used to define msgctxt:

IStringLocalizer<T> --> full name of T.
IStringLocalizer --> full name of the type the IStringLocalizer is used in

Notes

The Analyzer would not necessarily be part of this Repo because the PoExtractor does not care if you follow this convention or not. Also, OrchardCore.Localization doesn't care either, so it could be something completely separate?

(sidenote: here's an anlyzer doing the same for ILogger<T>: https://github.com/PavelStefanov/MicrosoftLogger.Analyzer)

Apr 25 '24 08:04 BrunoJuchli

To make it clear I didn't say that the analyzer should be a part of this repo :)
For T, S, and H I agree with you, I will try to make it optional, there's a related issue in the repo
For generic parameters, we could use the passed type and log a warning during the extraction process

Hope the above suggestions are fine for you

Apr 25 '24 09:04 hishamco

OrchardCoreContrib.PoExtractor OrchardCoreContrib.PoExtractor copied to clipboard

Remove some Limitations / Pitfalls

Motivation

Solution Idea

Alternative Solution

Vision

Analyzer

ExtractPo

Notes

OrchardCoreContrib.PoExtractor
OrchardCoreContrib.PoExtractor copied to clipboard