esprima-dotnet icon indicating copy to clipboard operation
esprima-dotnet copied to clipboard

A few additional rough edges to polish for v3

Open adams85 opened this issue 1 year ago • 9 comments

IMO these are some features/improvements which would be nice to include in v3 (to make it complete, so to speak):

  • [x] Support for comments. (The parsing part is already there, comments just need to be collected and exposed to the consumer. There's ongoing work related this, just need to be finalized AFAICS. Also, ability of writing comments should be added to JavascriptTextWriter even if comments were not implemented in AstToJavascript for now.)
  • [x] Revising and probably changing the design choice of Node.Data. It looks more and more like we'd be better off with some dictionary-like field instead of a plain object. The reason behind this is that not only the consumer but also the parser may want to attach additional information to specific nodes (for example, if we happen to implement attachComments). Using normal properties for such extra features would unnecessary increase the memory footprint of AST in normal cases. However, we could kill two birds with one stone using a dictionary.
  • [x] Writing back JSX to code. (This is more or less done, I'll just need to finalize it.)
  • [ ] Revising the public API of the parser. (This work is basically done, just need to be rebased and updated.)

Also, I'd appreciate any feedback on these thoughts. If you agree, I can work on these as time allows.

adams85 avatar Jul 26 '22 19:07 adams85

On the second point, here's what I'm using for data:

    public static object? GetData(this Node node, string type)
    {
        if (node.Data is null) return null;
        if (node.Data is not Dictionary<string, object> data) throw new FormatException("The Data field is polluted");

        data.TryGetValue(type, out var value);
        return value;
    }

    public static Node SetData(this Node node, string type, object value)
    {
        node.Data ??= new Dictionary<string, object>();

        if (node.Data is Dictionary<string, object> dictData)
            dictData[type] = value;
        return node;
    }

So while I clearly like the option, I'm wondering if there's any risk of it being a slippery slope?

CharlieEriksen avatar Jul 27 '22 08:07 CharlieEriksen

The fact you've ended up with such a use of Node.Data just confirms that this change would be desirable. I have an API like this in mind:

public abstract class Node
{
    // ...

    // removed: public object? Data; 

    public object? GetAdditionalData(object key) { /* ... */ } // could as well be generic to save a cast for consumers
    public void SetAdditionalData(object key, object? value) { /* ... */ }
}

These new methods would be backed by a dictionary and implemented roughly the same as your GetData/SetData methods (i.e. the dictionary would be allocated only when necessary).

So while I clearly like the option, I'm wondering if there's any risk of it being a slippery slope?

Apart from a slightly increased memory usage when additional data is attached, I'm aware of only one potential problem: using a lazy-initialized, plain dictionary won't be thread-safe. But I think it's ok to document this and leave it up to consumers to do the synchronization necessary for their use case.

Do you see other risks maybe?

adams85 avatar Jul 27 '22 12:07 adams85

Apart from a slightly increased memory usage when additional data is attached, I'm aware of only one potential problem: using a lazy-initialized, plain dictionary won't be thread-safe. But I think it's ok to document this and leave it up to consumers to do the synchronization necessary for their use case.

I'm no performance guru, so I have little actual feeling about whether the memory issue is a big problem. But is there any reason that we couldn't just use a ConcurrentDictionary?

Do you see other risks maybe?

I was mostly thinking about if it could end up being a bit of a crutch, where things like comments would just get thrown in there because it's easy/convenient, and suddenly the API becomes a mess. But that's definitely not a reason not to do this :)

CharlieEriksen avatar Jul 27 '22 19:07 CharlieEriksen

I'm no performance guru, so I have little actual feeling about whether the memory issue is a big problem. But is there any reason that we couldn't just use a ConcurrentDictionary?

We could but even that's not enough if we do lazy initialization (i.e. allocation only when the dictionary is actually needed). And I feel we should keep it that way so main use cases (Jint, primarily) don't have to pay for features which are irrelevant to them.

I was mostly thinking about if it could end up being a bit of a crutch, where things like comments would just get thrown in there because it's easy/convenient, and suddenly the API becomes a mess. But that's definitely not a reason not to do this :)

As for the public API, luckily OOP allows us to make it an implementation detail where things are actually stored. That is, for e.g. comments we're free to introduce normal properties/extension methods/whatever. If we did it as follows, we can make sure that it won't interfere with the data stored by consumers:

public abstract class Node
{
    // ...

    private static readonly s_leadingCommentsKey = new object();
    public IReadOnlyList<Comment>? LeadingComments
    {
        get => (IReadOnlyList<Comment>?) GetAdditionalData(s_leadingCommentsKey);
        set => SetAdditionalData(s_leadingCommentsKey, value);
    }
}

How about this? Would this be ok to you?

adams85 avatar Jul 27 '22 20:07 adams85

Funny thing is that this is very close to my very own solution. Here's how I use, as an example, the extension method I shared above:

public static class NodeTagExtensions
{
    private const string Tags = "tags";

    public static TagList? AddTags(this Node? node, TagList tagList)
    {
        if (node?.GetData(Tags) == null)
            node?.SetData(Tags, TagList.Empty);

        var data = node?.GetData(Tags) as TagList;
        data?.AddTags(tagList);
        return data;
    }

    public static TagList GetTags(this Node node)
    {
        return node.GetData(Tags) as TagList ?? TagList.Empty;
    }
}

It's extension methods all the way down! And it works rather well :)

CharlieEriksen avatar Jul 28 '22 05:07 CharlieEriksen

Funny thing is that this is very close to my very own solution.

Check out the PR I've submitted. Hopefully, this will make use cases like yours easier to implement.

adams85 avatar Jul 28 '22 22:07 adams85

Great improvement! Much appreciated :)

CharlieEriksen avatar Jul 30 '22 16:07 CharlieEriksen

Small status update on my side, this is more about parsing, not the public API.

  • still working on class improvements
  • they lead me to path of fixing scope validation (var/let, blocks) which should be done on parse time, but previous task on hold as branched to new work item
  • which lead me to improving parsing Context, now changing bool flags to flags enum that makes it easier to record old state (just take the flags value) and then restore it later

So probably trying to dig myself out of this hole in some reverse order.. family etc has kept me a bit busy so I tend always start fresh as I have no idea where I left off...

lahma avatar Aug 05 '22 09:08 lahma

You have some nice improvements in the works! Variable scope validation in JS (if we're talking about the same thing) is a deep, dark rabbit hole in my experience though... :)

The question is how much I'd interfere with your work if I went on with the changes I planned around the parser (i.e. resurrecting https://github.com/sebastienros/esprima-dotnet/pull/175). Probably we'd better postpone that until you finish your work.

adams85 avatar Aug 05 '22 14:08 adams85

I tagged intermediate release v3.0.0-beta-4which should appear on NuGet soon. Allows easier dogfooding.

lahma avatar Aug 09 '22 16:08 lahma

Hey @adams85,

I think in df506369d77c5e0db56ccda7293bcdc2e6fd3062, the ToString changes for Location was a bit unfortunate. I have code that persists Locations using ToString. I understand that ToString is not strictly meant as a serialization method, but the new format (Which is nicer to read but more complex to parse) really needs a way to parse a string back into a Location to not make this a big breaking change IMO :)

CharlieEriksen avatar Aug 19 '22 09:08 CharlieEriksen