incubator-annotator icon indicating copy to clipboard operation
incubator-annotator copied to clipboard

Implement Chromium/WICG’s Text Fragment specification

Open tilgovi opened this issue 4 years ago • 4 comments

Spec: https://wicg.github.io/ScrollToTextFragment/#parsing-the-fragment-directive

tilgovi avatar Feb 27 '20 21:02 tilgovi

We might also polyfill just the window.location.fragmentDirective as a building block for this: https://wicg.github.io/ScrollToTextFragment/#feature-detectability

tilgovi avatar Feb 27 '20 21:02 tilgovi

Is this still a plan? We have ditched our other fragment identifier effort (see PR #71).

Besides making a parser for the fragment identifier syntax itself, I suppose we would want to implement the algorithm for resolving it. May well be worthwhile if others are not already doing this!

Also, I added this topic to the new Tech Radar page on the wiki.

Treora avatar Jun 18 '20 17:06 Treora

Besides making a parser for the fragment identifier syntax itself, I suppose we would want to implement the algorithm for resolving it. May well be worthwhile if others are not already doing this!

Update: last month, I decided to take a stab at this and implemented the algorithm in TypeScript: https://code.treora.com/gerben/text-fragments-ts

See also: https://github.com/WICG/scroll-to-text-fragment/issues/135

Perhaps it could some day be considered adopting this implementation in Annotator, but I suppose it is a bit early as the spec is still in flux, and so far my impression is that few people (want to) adopt it.

I expect that a significant disadvantage for many annotation-ish tools would be that, as it is currently defined, the expressivity of the text fragment identifier is limited by only being able to point at whole words. See my issue #37 on the spec’s repo.

To use the fragment syntax as a standard for use within, and exchange between, annotation softwares, one could of course choose to interpret it differently (except when activating a ‘browser compat mode’), e.g. as a WA RangeSelector containing two TextQuoteSelectors. But that might lead to a misleading situation of half-interoperability which seems better to avoid.

Nevertheless, for the goal of making annotations (ex)portable from annotation tools, it could be valuable to have a tool that helps convert (where possible) annotation targets to browser-compatible text-fragment URLs. And vice versa to import them.

Treora avatar Sep 03 '20 17:09 Treora

We shortly discussed this topic in today’s call while looking over the open issues. We agreed that just parsing the syntax is not much use, as it comes together with a specific algorithm for finding the target text, which differs from the Web Annotation model.

A quick overview of things we could provide:

  • anchoring of a fragment directive: I implemented the essence of this already (see above comment); we could provide a function that simply wraps my implementation. (we even discussed the option of importing my whole implementation into this repo, though to me it feels cleaner to keep these as separate projects)

  • describing a selection (a Range or perhaps a list of Ranges) as a fragment directive: this would need a custom adaptation of describeTextQuote, modified to ensure that the total quote (including prefix&suffix) ends at word boundaries (note that at least this is possible now, since a recent change in the spec). Also, it should use a textStart,textEnd pair (again to be cut at word boundaries) instead of an exact quote when the selection crosses block elements. And perhaps there are more hurdles.

  • convert fragment directive ⇒ Selector: If the document is available, we could simply anchor it and describe it in the other format. Without the document at hand, we could also convert it, although with a (hopefully small) risk that the differences in specifications will make it fail to anchor or (worse) point at something else. I think the conversion could, after syntax parsing, be done with more or less this simple code:

      ({ prefix, textStart, textEnd, suffix }) => textEnd
          ? {
              type: 'RangeSelector',
              start: { type: 'TextQuoteSelector', prefix, exact: textStart },
              end: { type: 'TextQuoteSelector', prefix: textEnd, exact: '', suffix }
          }
          : { type: 'TextQuoteSelector', prefix, exact: textStart, suffix }
    

    (note the little hack of using prefix: textEnd, exact: '' because RangeSelector’s end is exclusive and textEnd should nevertheless be included in the target)

  • convert Selector ⇒ fragment directive: the reverse of the above. Again, if the document is available, we could simply anchor it and describe it in the other format. But in case the document is not available, conversion in this direction would only possible if the selector is of the type/shape shown in the above example code.

I suppose it is mainly a matter of demand and priority whether we’ll implement any of these. I might actually try tackle some of these points soon, as I would like to use these features myself.

Treora avatar Nov 05 '20 21:11 Treora