julia icon indicating copy to clipboard operation
julia copied to clipboard

Add meaningful names to AnnotatedString annotations to make their usage understandable

Open aplavin opened this issue 1 year ago • 17 comments

This is IMO quite unintuitive and confusing to read:

julia> s = Base.AnnotatedString("this is an example annotated string", [(1:18, :A => 111), (12:28, :B => 222), (18:35, :C => 333)])

julia> Base.annotations(s)[2][1]
12:28

julia> Base.annotations(s)[2][2][1]  # or Base.annotations(s)[2][2].first
:B

julia> Base.annotations(s)[2][2][2]  # or Base.annotations(s)[2][2].second
222

Would be much nicer and self-explanatory to use meaningful names instead of numbers:

julia> Base.annotations(s)[2].region
12:28

julia> Base.annotations(s)[2].label
:B

julia> Base.annotations(s)[2].value
222

aplavin avatar Jul 25 '24 13:07 aplavin

i want to work on this issue.

SarthakPaandey avatar Jul 25 '24 17:07 SarthakPaandey

I see the point here, and I think a NamedTuple{(:region, :label, :value)} could make sense, but this would be a breaking change somewhat late in the development cycle.

tecosaur avatar Jul 25 '24 18:07 tecosaur

but this would be a breaking change

Before 1.11 release is the only time to make such "breaking" changes (not really breaking because feature wasn't released). Otherwise Julia and all users are stuck with these decisions "forever".

somewhat late in the development cycle

Not sure what exactly you mean here... Now appears to be a common reasonable time to try using 1.11 in realistic scenarios, given that it's becoming more stable and regressions are getting fixed. In comparison, nightly is much less stable, so trying 1.12 features now and giving feedback based on actual experience is more challenging, only for the most dedicated users.

aplavin avatar Jul 25 '24 20:07 aplavin

Before 1.11 release is the only time to make such "breaking" changes (not really breaking because feature wasn't released)

Right, I should clarify that this is breaking with Julia v1.11-rc1, not a currently released Julia. However, my understanding of the alpha, beta, and rc stages is that non-bugfix changes are more appropriate for the alpha and maybe beta releases.

On that note, with

somewhat late in the development cycle

I specifically mean this in the sense that this feature has been part of Julia 1.11 for a while now, and we're now just before the release of 1.11.

In this sense "let's change an aspect of the API of a new 1.11 feature" is rather late in the development cycle.

I do sympathise with only just trying a new feature now and having thoughts, but it places a fair bit of time pressure on me to make such changes and inconvenience other people involved in the release process if we decide this change should be part of 1.11.

I think this could be a nice API tweak, it complicates construction but makes the intent in the values more obvious. Not 100% sold though, so I'd be interested in hearing other thoughts, and also on the matter of whether such a change is even potentially appropriate for an rc-stage release.

tecosaur avatar Jul 26 '24 03:07 tecosaur

@fatteneder I'm not sure quite what opinion you're trying to express via :-1:, care to elaborate?

tecosaur avatar Jul 27 '24 04:07 tecosaur

So what if it's late in the cycle?

nsajko avatar Jul 27 '24 09:07 nsajko

I dislike that this is quality-of-life improvement might not happen, and that we might be stuck now with this until a hypothetical v2.

Right, I should clarify that this is breaking with Julia v1.11-rc1, not a currently released Julia. However, my understanding of the alpha, beta, and rc stages is that non-bugfix changes are more appropriate for the alpha and maybe beta releases.

It would be nice to have this confirmed from someone in charge.

IMO breaking RC should be ok, after all its the last chance to fix things before you are bound to the backwards compatibility guarantee that definitely applies to full release.

fatteneder avatar Jul 27 '24 10:07 fatteneder

I dislike that this is quality-of-life improvement might not happen, and that we might be stuck now with this until a hypothetical v2.

I fully appreciate that, I just don't know how that jibes with the way that release candidates are handled (I don't think we have any docs anywhere on this?).

It also makes annotations a bit more of a pain to write if you're writing a lot of them, but maybe that can be resolved with a constructor?

tecosaur avatar Jul 27 '24 10:07 tecosaur

It also makes annotations a bit more of a pain to write if you're writing a lot of them

The constructor can support both annotations formats: one convenient for manual writing (is it expected to be common?) and another convenient for any programmatic processing (and returned from annotations()). The latter should have names to make any processing code readable, while the former can be anything suitable for explicitly listing them in the code, eg the current version.

Of course, if hand-writing annotations in code isn't supposed to be very common, then the corresponding format may not be needed at all. Extra verbosity doesn't depend on the number of annotations btw:

julia> NamedTuple{(:range,:label,:value)}[
       (1:5, :x, 123),
       (3:6, :y, 456),
      # and more ...
       ]
2-element Vector{NamedTuple{(:range, :label, :value)}}:
 (range = 1:5, label = :x, value = 123)
 (range = 3:6, label = :y, value = 456)

aplavin avatar Jul 27 '24 22:07 aplavin

Why would this have to be a named tuple instead of a struct?

nsajko avatar Jul 28 '24 06:07 nsajko

Mmm, so I'd say the contenders for the form are:

  • A Tuple{UnitRange{Int}, Pair{Symbol, Any}}, as currently used
  • A NamedTuple{(:range, :label, :value), Tuple{UnitRange{Int}, Symbol, Any}}
  • Defining:
struct Annotation
    range::UnitRange{Int}
    label::Symbol
    value::Any
end

or similar.

I do just wish this came up in #49586 rather than once it's part of a release candidate 🫠.

tecosaur avatar Jul 28 '24 07:07 tecosaur

Mmm, so I'd say the contenders for the form are:

IMO anything with reasonable names is better than opaque numbers, don't have a preference between 2 and 3.

I do just wish this came up in https://github.com/JuliaLang/julia/pull/49586 rather than once it's part of a release candidate 🫠.

With base Julia, one cannot try out a speicifc new feature – it's all or nothing. There are still known rough edges and regressions, but RC at least tend to have fewer of them, making it feasible to actually try using the upcoming version in real scenarios. I only noticed this issue when actually doing processing/transformations/aggregations of annotations in a string. Using and evaluating individual new features is much more straightforward when they are packages vs julia base.

aplavin avatar Jul 28 '24 11:07 aplavin

Triage things this is something definitely worth doing. @tecosaur will try to do it within a fortnight and would be happy to receive help.

LilithHafner avatar Aug 01 '24 15:08 LilithHafner

To clarify on the release process issue, yes it is ok to break things in a release candidate in a new feature that is not yet in a final release.

JeffBezanson avatar Aug 06 '24 21:08 JeffBezanson

A good approach is to define const Annotation = @NamedTuple{range::UnitRange{Int}, label::Symbol, value::Any}, so it has the functionality of a NamedTuple but the value::Any declaration is there to avoid dealing with heterogeneous named tuple types. Type alias printing can handle it too.

JeffBezanson avatar Aug 14 '24 21:08 JeffBezanson

I'm still looking for the time to make this change. In the meantime, I'll just ask if anyone has thoughts on what other functions that have the label-value pair without the range (e.g. AnnotatedChar, StyledStrings.eachregion) should return.

Using @NamedTuple{label::Symbol, value::Any} would be the obvious choice, but I may as well see what people's thoughts are.

tecosaur avatar Aug 18 '24 12:08 tecosaur

I agree that @NamedTuple{label::Symbol, value::Any} is good. I can't think of anything better.

LilithHafner avatar Aug 18 '24 16:08 LilithHafner

Well, here we are, the last item on the 1.11 milestone. This is a good example of how Base is not a good venue for designing novel, complex APIs. Instead of having both AnnotatedString and julia 1.11, we have neither.

JeffBezanson avatar Sep 06 '24 21:09 JeffBezanson

I think more than anything else this illustrates the problem with:

  • Changes asked for late in a release cycle
  • When there is a lone person who is responsible/actively working on the subsystem in question
  • And said person is just about to enter a very busy period

As alluded to in my last message, and stated on Slack, over the past ~month I simply haven't had the time for this. The good news is that this busy period started winding down a few days ago, and so I can actually work on this now.

tecosaur avatar Sep 07 '24 05:09 tecosaur

Update: I think I have this implemented now, along with the required changes to StyledStrings and JuliaSyntaxHighlighting. I now need to do a few sanity checks and update the tests, then I'll put a PR up.

tecosaur avatar Sep 08 '24 18:09 tecosaur

  • Changes asked for late in a release cycle

But the whole point is that new APIs need time to try out and revise the design. That inherently makes it harder to work within Base. Is it workable to mark the new AnnotatedString APIs as experimental? I don't have a good handle on whether that would be realistic. Of course it wouldn't necessarily stop people from using it, but might be better than nothing.

JeffBezanson avatar Sep 11 '24 21:09 JeffBezanson

I was even a bit surprised this went into Base without trying out the interface in a package first... https://github.com/JuliaLang/julia/pull/49586#issuecomment-1704076289 But now – nice if at least readily apparent and reported api improvements get in before the release :)

aplavin avatar Sep 11 '24 23:09 aplavin

Is it workable to mark the new AnnotatedString APIs as experimental

That approach worked quite well with the multithreading API, I think?

oschulz avatar Sep 12 '24 14:09 oschulz

It should be moved out of Base (into StyledStrings or its own stdlib). join and * will have to be figured out later. This + https://github.com/JuliaLang/StyledStrings.jl/issues/61 + all other places where these invalidations pop up shows over and over together with the non-generic _is_annotated checks in generic methods shows that this split and design is not right.

KristofferC avatar Sep 13 '24 12:09 KristofferC

As alluded to in my last message, and stated on Slack, over the past ~month I simply haven't had the time for this.

I think this highlights the danger in bringing in someone's more or less personal project into Base / stdlibs and all of a sudden it is on the critical path for releases. Especially, if that project is kind of new and "fun" and not old and boring.

KristofferC avatar Sep 19 '24 10:09 KristofferC

Anyway, I don't understand why this is on the milestone when the existing API would still work if using NamedTuples instead of tuples so adding that will not break anything. So I will take this off the milestone.

KristofferC avatar Sep 19 '24 10:09 KristofferC

From the linked PR

the type of annotations is to be changed from a Tuple{UnitRange{Int}, Pair{Symbol, Any}} to a NamedTuple{(:region, :label, :value), Tuple{UnitRange{Int}, Symbol, Any}}.

It's not just tuple -> named tuple, it's nested tuple-pair to flat named tuple. This changes (region, label => value) to (;region, label, value) and breaks access in the form annotation[2][1] and annotation[2].first.

LilithHafner avatar Sep 19 '24 11:09 LilithHafner

Ok, then we declare the whole thing experimental. This has been on the milestone for two months for some new color highlighting thingy. Insanity.

KristofferC avatar Sep 19 '24 12:09 KristofferC

Removing from milestone since StyledStrings + AnnotatedString will be considered experimental

KristofferC avatar Sep 19 '24 12:09 KristofferC

https://github.com/JuliaLang/StyledStrings.jl/pull/92

KristofferC avatar Sep 19 '24 12:09 KristofferC