Add support for Text fragment feature (#1545)
Text Fragment feature implementation pull request. The feature follows the published URL Fragment Text Directives specification (https://wicg.github.io/scroll-to-text-fragment/).
lychee -vv --include-text-fragments https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments
[DEBUG] tdirective: "text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] status: Completed
[DEBUG] result: "From the foregoing remarks we may gather an idea of the importance"
[200] https://mdn.github.io/css-examples/target-text/index.html#:~:text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] tdirective: "text=linked%20URL,-'s%20format"
[DEBUG] status: Completed
[DEBUG] result: "linked URL"
[DEBUG] tdirective: "text=Deprecated-,attributes,attribute"
[DEBUG] status: Completed
[DEBUG] result: "attributes charset Deprecated Hinted at the character encoding of the linked URL. Note:This attribute"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute
[DEBUG] tdirective: "text=downgrade:-,The%20Referer,be%20sent,-to%20origins"
[DEBUG] status: Completed
[DEBUG] result: "The Referer header will not be sent"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=downgrade:-,The%20Referer,be%20sent,-to%20origins
[DEBUG] tdirective: "text=linked%20URL,defining%20a%20value"
[DEBUG] status: Completed
[DEBUG] result: "linked URL as a download. Can be used with or without a filename value: Without a value, the browser will suggest a filename/extension, generated from various sources: The Content-Disposition HTTP header The final segment in the URL path The media type (from the Content-Type header, the start of a data: URL, or Blob.type for a blob: URL) filename: defining a value"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,defining%20a%20value
If the fragment directive is not found, a TextDirectiveNotFound error will be returned.
Below changes are completed:
- Fragment Directive parser uses
fancy-regex
- this package was added as a dependency
- a new flag,
include-text-fragmentsis added to support the feature
- this is a deviation from the original feature request (which asked for using the text-fragments flag itself)
- Fragment (Text) Directive feature is tested on LTR sites only
- new
UrlExttrait is implemented to enhance Url's to support Fragment Directive - Support for multiple text fragment directives (for example,
#:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute) - tests are added for validating the feature
- cargo clippy & cargo tests were executed
I missed to run the clippy across the test modules - the related lint failure issues are now fixed and ready for review!
Left some comments. I like the overall structure. Good work so far!
@mre I'll take a look at each of the comments and work to address it - thank you!
@Thiru, any updates? Let me know in case you need any help. 😃
@thiru, any updates? Let me know in case you need any help. 😃
@mre, I will submit the fixes by this weekend for your review. Earlier I started moving the feature into a separate crate, while addressing your review comments, but got pulled into few other tasks and so couldn't get back earlier :-(.
Thanks, sounds good!
Thanks, sounds good!
@mre my apologies for the delay - while refactoring, into a separate crate, I encountered few corner case issues and it took more time than planned - running the tests now and hoping to re-raise the pull-request by tomorrow!
I am committing the changes into my fork and will initiate a pull-request shortly!
@mre request your help in addressing the CI / publish-check failure - please recommend if I've to make any changes on my end - thank you!
I've added a few more comments. Most of them are minor. I think it should be fairly easy to go through them and accept/reject my suggestions from the GitHub UI.
As for the error, it's currently failing because there is no textfrag crate on crates.io. That is to be expected, because we haven't published it yet. However, the question is if we do want to publish it at all.
cargo expects all dependencies to be published on crates.io. We have three options:
- Publish the crate under
lychee-textfrag. This would require renaming the crate to fit our naming scheme. - Publish the crate as a separate project, e.g.
textfrag. Then we'd have to move the code out into a separate repo and perhaps add a CI/CD process for releases. We could keep it in thelycheeversenamespace, or you publish it under your own name. The license has to be compatible (Apache/MIT). - Don't publish the crate. In that case, we'd have to move the
textfragcrate as a submodule into thelychee-libcrate. The advantage is that we don't have to set up a separate repo, which will make future changes easier.
I'd vote for option 3, which is the easiest right now and allows us to keep maintaining the code inside lychee. It's still reasonably well encapsuled in its own module and we can always make it a separate crate later once the code is mature enough. The downside is longer compile-times because it would be in the same codegen unit as the rest of the library code.
Detailed instructions for option 3
To merge the textfrag code directly into lychee-lib, here's what you could do:
- Move the
textfragcode intolychee-lib, perhaps in a submodule likelychee-lib/src/textfrag/ - Remove
textfragfrom your workspace members in the rootCargo.toml - Update any imports in
lychee-libto reference the new module location instead of the external crate
This approach has several benefits:
- Simplifies your publishing process - no need to maintain and publish a separate crate
- Makes it clear that this is internal implementation code
- Gives you more flexibility to change the code without worrying about breaking other potential users
Let me know what you think.
I've added a few more comments. Most of them are minor. I think it should be fairly easy to go through them and accept/reject my suggestions from the GitHub UI.
As for the error, it's currently failing because there is no
textfragcrate on crates.io. That is to be expected, because we haven't published it yet. However, the question is if we do want to publish it at all.cargoexpects all dependencies to be published on crates.io. We have three options:
- Publish the crate under
lychee-textfrag. This would require renaming the crate to fit our naming scheme.- Publish the crate as a separate project, e.g.
textfrag. Then we'd have to move the code out into a separate repo and perhaps add a CI/CD process for releases. We could keep it in thelycheeversenamespace, or you publish it under your own name. The license has to be compatible (Apache/MIT).- Don't publish the crate. In that case, we'd have to move the
textfragcrate as a submodule into thelychee-libcrate. The advantage is that we don't have to set up a separate repo, which will make future changes easier.I'd vote for option 3, which is the easiest right now and allows us to keep maintaining the code inside lychee. It's still reasonably well encapsuled in its own module and we can always make it a separate crate later once the code is mature enough. The downside is longer compile-times because it would be in the same codegen unit as the rest of the library code.
Detailed instructions for option 3 To merge the
textfragcode directly intolychee-lib, here's what you could do:
- Move the
textfragcode intolychee-lib, perhaps in a submodule likelychee-lib/src/textfrag/- Remove
textfragfrom your workspace members in the rootCargo.toml- Update any imports in
lychee-libto reference the new module location instead of the external crateThis approach has several benefits:
- Simplifies your publishing process - no need to maintain and publish a separate crate
- Makes it clear that this is internal implementation code
- Gives you more flexibility to change the code without worrying about breaking other potential users
Let me know what you think.
@mre I understand and agree with your recommendation - I am addressing this and, as well, the other review comments and republish for review - thanks for your patience!
@thiru-appitap, I saw that you did some work. Can you close the resolved conversations already? It becomes a bit hard to keep track of the open TODOs. 😉
Also, did you use the "commit suggestion" feature from GitHub? For some of the changes, I made suggestions, which are easy to merge with a single click. This way, changes should be much easier to handle on your end. Of course, you have to git pull the changes locally to be up-to-date.
@thiru-appitap, I saw that you did some work. Can you close the resolved conversations already? It becomes a bit hard to keep track of the open TODOs. 😉 Also, did you use the "commit suggestion" feature from GitHub? For some of the changes, I made suggestions, which are easy to merge with a single click. This way, changes should be much easier to handle on your end. Of course, you have to
git pullthe changes locally to be up-to-date.
@mre was juggling between couple of priorities, along with local travels and so couldn't resolve & commit the changes earlier itself - now on I should be able to get back with much faster turnaround on the feedback now, the changes are all done (fingerscrossed) based on the feedback; along with, I have tried to adopt the idiomatic way as much as I could - willing to learn more and please keep helping me with more feedback!
This branch kindly asks for a rebase.
Running cargo install --branch text-fragment --git https://github.com/thiru-appitap/lychee.git and then using lychee --include-text-fragments shows that this works in principle.
Would be nice to see this arrive.
This branch kindly asks for a rebase.
Running
cargo install --branch text-fragment --git https://github.com/thiru-appitap/lychee.gitand then usinglychee --include-text-fragmentsshows that this works in principle.Would be nice to see this arrive.
I have rebased the branch to the latest but am facing lint failures - working on to fix it and recommit by this weekend.
I guess you can just do another rebase. We fixed the lint errors in master now.
@thiru-appitap, would you like to continue working on this? Alternatively, we can close the PR. I still believe that it would be great to have this feature, but I'm not sure if anyone has the bandwidth to look into it at the moment.