lychee icon indicating copy to clipboard operation
lychee copied to clipboard

Add support for Text fragment feature (#1545)

Open thiru-appitap opened this issue 1 year ago • 16 comments

Text Fragment feature implementation pull request. The feature follows the published URL Fragment Text Directives specification (https://wicg.github.io/scroll-to-text-fragment/).

lychee -vv --include-text-fragments  https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments
[DEBUG] tdirective: "text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] status: Completed
[DEBUG] result: "From the foregoing remarks we may gather an idea of the importance"
[200] https://mdn.github.io/css-examples/target-text/index.html#:~:text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] tdirective: "text=linked%20URL,-'s%20format"
[DEBUG] status: Completed
[DEBUG] result: "linked URL"
[DEBUG] tdirective: "text=Deprecated-,attributes,attribute"
[DEBUG] status: Completed
[DEBUG] result: "attributes     charset Deprecated   Hinted at the character encoding of the linked URL.   Note:This attribute"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute
[DEBUG] tdirective: "text=downgrade:-,The%20Referer,be%20sent,-to%20origins"
[DEBUG] status: Completed
[DEBUG] result: "The Referer header will not be sent"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=downgrade:-,The%20Referer,be%20sent,-to%20origins
[DEBUG] tdirective: "text=linked%20URL,defining%20a%20value"
[DEBUG] status: Completed
[DEBUG] result: "linked URL as a download. Can be used with or without a filename value:    Without a value, the browser will suggest a filename/extension, generated from various sources:   The Content-Disposition HTTP header  The final segment in the URL path  The media type (from the Content-Type header, the start of a data: URL, or Blob.type for a blob: URL)     filename: defining a value"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,defining%20a%20value

If the fragment directive is not found, a TextDirectiveNotFound error will be returned.

Below changes are completed:

  1. Fragment Directive parser uses fancy-regex
  • this package was added as a dependency
  1. a new flag, include-text-fragments is added to support the feature
  • this is a deviation from the original feature request (which asked for using the text-fragments flag itself)
  1. Fragment (Text) Directive feature is tested on LTR sites only
  2. new UrlExt trait is implemented to enhance Url's to support Fragment Directive
  3. Support for multiple text fragment directives (for example, #:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute)
  4. tests are added for validating the feature
  5. cargo clippy & cargo tests were executed

thiru-appitap avatar Dec 25 '24 08:12 thiru-appitap

I missed to run the clippy across the test modules - the related lint failure issues are now fixed and ready for review!

thiru-appitap avatar Dec 25 '24 09:12 thiru-appitap

Left some comments. I like the overall structure. Good work so far!

@mre I'll take a look at each of the comments and work to address it - thank you!

thiru-appitap avatar Jan 12 '25 21:01 thiru-appitap

@Thiru, any updates? Let me know in case you need any help. 😃

mre avatar Feb 05 '25 09:02 mre

@thiru, any updates? Let me know in case you need any help. 😃

@mre, I will submit the fixes by this weekend for your review. Earlier I started moving the feature into a separate crate, while addressing your review comments, but got pulled into few other tasks and so couldn't get back earlier :-(.

thiru-appitap avatar Feb 05 '25 22:02 thiru-appitap

Thanks, sounds good!

mre avatar Feb 06 '25 01:02 mre

Thanks, sounds good!

@mre my apologies for the delay - while refactoring, into a separate crate, I encountered few corner case issues and it took more time than planned - running the tests now and hoping to re-raise the pull-request by tomorrow!

thiru-appitap avatar Feb 12 '25 23:02 thiru-appitap

I am committing the changes into my fork and will initiate a pull-request shortly!

thiru-appitap avatar Feb 14 '25 04:02 thiru-appitap

@mre request your help in addressing the CI / publish-check failure - please recommend if I've to make any changes on my end - thank you!

thiru-appitap avatar Feb 14 '25 04:02 thiru-appitap

I've added a few more comments. Most of them are minor. I think it should be fairly easy to go through them and accept/reject my suggestions from the GitHub UI.

As for the error, it's currently failing because there is no textfrag crate on crates.io. That is to be expected, because we haven't published it yet. However, the question is if we do want to publish it at all. cargo expects all dependencies to be published on crates.io. We have three options:

  1. Publish the crate under lychee-textfrag. This would require renaming the crate to fit our naming scheme.
  2. Publish the crate as a separate project, e.g. textfrag. Then we'd have to move the code out into a separate repo and perhaps add a CI/CD process for releases. We could keep it in the lycheeverse namespace, or you publish it under your own name. The license has to be compatible (Apache/MIT).
  3. Don't publish the crate. In that case, we'd have to move the textfrag crate as a submodule into the lychee-lib crate. The advantage is that we don't have to set up a separate repo, which will make future changes easier.

I'd vote for option 3, which is the easiest right now and allows us to keep maintaining the code inside lychee. It's still reasonably well encapsuled in its own module and we can always make it a separate crate later once the code is mature enough. The downside is longer compile-times because it would be in the same codegen unit as the rest of the library code.

Detailed instructions for option 3

To merge the textfrag code directly into lychee-lib, here's what you could do:

  1. Move the textfrag code into lychee-lib, perhaps in a submodule like lychee-lib/src/textfrag/
  2. Remove textfrag from your workspace members in the root Cargo.toml
  3. Update any imports in lychee-lib to reference the new module location instead of the external crate

This approach has several benefits:

  • Simplifies your publishing process - no need to maintain and publish a separate crate
  • Makes it clear that this is internal implementation code
  • Gives you more flexibility to change the code without worrying about breaking other potential users

Let me know what you think.

mre avatar Feb 14 '25 21:02 mre

I've added a few more comments. Most of them are minor. I think it should be fairly easy to go through them and accept/reject my suggestions from the GitHub UI.

As for the error, it's currently failing because there is no textfrag crate on crates.io. That is to be expected, because we haven't published it yet. However, the question is if we do want to publish it at all. cargo expects all dependencies to be published on crates.io. We have three options:

  1. Publish the crate under lychee-textfrag. This would require renaming the crate to fit our naming scheme.
  2. Publish the crate as a separate project, e.g. textfrag. Then we'd have to move the code out into a separate repo and perhaps add a CI/CD process for releases. We could keep it in the lycheeverse namespace, or you publish it under your own name. The license has to be compatible (Apache/MIT).
  3. Don't publish the crate. In that case, we'd have to move the textfrag crate as a submodule into the lychee-lib crate. The advantage is that we don't have to set up a separate repo, which will make future changes easier.

I'd vote for option 3, which is the easiest right now and allows us to keep maintaining the code inside lychee. It's still reasonably well encapsuled in its own module and we can always make it a separate crate later once the code is mature enough. The downside is longer compile-times because it would be in the same codegen unit as the rest of the library code.

Detailed instructions for option 3 To merge the textfrag code directly into lychee-lib, here's what you could do:

  1. Move the textfrag code into lychee-lib, perhaps in a submodule like lychee-lib/src/textfrag/
  2. Remove textfrag from your workspace members in the root Cargo.toml
  3. Update any imports in lychee-lib to reference the new module location instead of the external crate

This approach has several benefits:

  • Simplifies your publishing process - no need to maintain and publish a separate crate
  • Makes it clear that this is internal implementation code
  • Gives you more flexibility to change the code without worrying about breaking other potential users

Let me know what you think.

@mre I understand and agree with your recommendation - I am addressing this and, as well, the other review comments and republish for review - thanks for your patience!

thiru-appitap avatar Feb 15 '25 01:02 thiru-appitap

@thiru-appitap, I saw that you did some work. Can you close the resolved conversations already? It becomes a bit hard to keep track of the open TODOs. 😉 Also, did you use the "commit suggestion" feature from GitHub? For some of the changes, I made suggestions, which are easy to merge with a single click. This way, changes should be much easier to handle on your end. Of course, you have to git pull the changes locally to be up-to-date.

mre avatar Feb 20 '25 09:02 mre

@thiru-appitap, I saw that you did some work. Can you close the resolved conversations already? It becomes a bit hard to keep track of the open TODOs. 😉 Also, did you use the "commit suggestion" feature from GitHub? For some of the changes, I made suggestions, which are easy to merge with a single click. This way, changes should be much easier to handle on your end. Of course, you have to git pull the changes locally to be up-to-date.

@mre was juggling between couple of priorities, along with local travels and so couldn't resolve & commit the changes earlier itself - now on I should be able to get back with much faster turnaround on the feedback now, the changes are all done (fingerscrossed) based on the feedback; along with, I have tried to adopt the idiomatic way as much as I could - willing to learn more and please keep helping me with more feedback!

thiru-appitap avatar Feb 21 '25 19:02 thiru-appitap

This branch kindly asks for a rebase.

Running cargo install --branch text-fragment --git https://github.com/thiru-appitap/lychee.git and then using lychee --include-text-fragments shows that this works in principle.

Would be nice to see this arrive.

almereyda avatar Apr 08 '25 12:04 almereyda

This branch kindly asks for a rebase.

Running cargo install --branch text-fragment --git https://github.com/thiru-appitap/lychee.git and then using lychee --include-text-fragments shows that this works in principle.

Would be nice to see this arrive.

I have rebased the branch to the latest but am facing lint failures - working on to fix it and recommit by this weekend.

thiru-appitap avatar Apr 10 '25 14:04 thiru-appitap

I guess you can just do another rebase. We fixed the lint errors in master now.

mre avatar May 09 '25 16:05 mre

@thiru-appitap, would you like to continue working on this? Alternatively, we can close the PR. I still believe that it would be great to have this feature, but I'm not sure if anyone has the bandwidth to look into it at the moment.

mre avatar Sep 23 '25 14:09 mre