lychee icon indicating copy to clipboard operation
lychee copied to clipboard

`--accept` doesn't accept the same values as `--cache-exclude-status`

Open katexochen opened this issue 5 months ago • 8 comments

  --cache-exclude-status <CACHE_EXCLUDE_STATUS>
      A list of status codes that will be ignored from the cache

      The following exclude range syntax is supported: [start]..[[=]end]|code. Some valid
      examples are:

      - 429 (excludes the 429 status code only)
      - 500.. (excludes any status code >= 500)

looks like this should work

$ lychee -- --config tools/lychee/config.toml --cache --cache-exclude-status '500..' .
warning: Git tree '/home/katexochen/edgelesssys/contrast' is dirty
error: invalid value '500..' for '--cache-exclude-status <CACHE_EXCLUDE_STATUS>': failed to parse accept range: no range pattern found

For more information, try '--help'.
error: Recipe `check-links` failed on line 385 with exit code 2

$ lychee -- --version
lychee 0.18.1

katexochen avatar Jul 16 '25 15:07 katexochen

Thank for reporting this. You are right, there is a mismatch between the docs and the implementation. ~~The tests in lychee_lib::types::status_code::selector::StatusCodeSelector do not contain the expressions mentioned in the docs and the implementation is missing them as well.~~ (actually does in lychee_lib::types::accept::range::AcceptRange)

This seems to affect the following forms: 100.., ..100, ..=100


For now you can use `500..=999` as a workaround to represent `500..`

thomas-zahner avatar Jul 17 '25 08:07 thomas-zahner

..100

This actually seem to work as intended.

katexochen avatar Jul 17 '25 08:07 katexochen

Huh interesting. I've tested it with the --accept flag then. This flag seems to behave differently than --cache-exclude-status but I think it shouldn't. I've now had a closer look and found with git bisect that in e13f7c9fef32923982d8eaac7111b199b5ce2196 (see https://github.com/lycheeverse/lychee/pull/1668) your problem has been fixed. ~~Since your might be on Nix you are probably using v0.18.1 via nixpkgs which doesn't include the fix yet.~~ (Ah just realised that you updated lychee in nixpkgs, thanks a lot 👍)

So in summary, updating to lychee v0.19.0 will fix this issue. However, --accept still doesn't work as advertised and behaves behaves differently compared to --cache-exclude-status. @HadrienG2 Do you know why this is the case?

thomas-zahner avatar Jul 17 '25 13:07 thomas-zahner

Hmm, that's interesting. What I can quickly tell is that...

So far, this suggests that there is something special about StatusCodeSelector::new_from(). But to my eye they look identical. So I'm not 100% sure what's going on here...

HadrienG2 avatar Jul 17 '25 18:07 HadrienG2

Ok, I got it. The actual error is on current master is...

$ cargo run -- --accept=300.. https://github.com/lycheeverse/lychee/issues/1769#issuecomment-3084022958
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/lychee --accept=300.. 'https://github.com/lycheeverse/lychee/issues/1769#issuecomment-3084022958'`
Error: invalid status code

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /home/hadrien/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/anyhow-1.0.98/src/backtrace.rs:27:14
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /home/hadrien/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:2050:27
   2: lychee::client::create
             at ./lychee-bin/src/client.rs:20:41

...where the matching source code line is...

    let accepted: HashSet<StatusCode> = cfg.accept.clone().try_into()?;

As it turns out, the logic for converting AcceptRange to HashSet is not a pure copy-paste between StatusCodeSelector and StatusCodeExcluder, as is the case for most code in these structs (this begs for some refactor someday...). For StatusCodeSelector, that logic is:

impl<S: BuildHasher + Default> TryFrom<StatusCodeSelector> for HashSet<StatusCode, S> {
    type Error = http::status::InvalidStatusCode;

    fn try_from(value: StatusCodeSelector) -> Result<Self, Self::Error> {
        <HashSet<u16>>::from(value)
            .into_iter()
            .map(StatusCode::from_u16)
            .collect()
    }
}

...and this looping logic will eventually encounter an u16 value that is not a valid HTTP status code while iterating over the unbounded AcceptRange.

HadrienG2 avatar Jul 17 '25 18:07 HadrienG2

I can think of a few ways to fix this:

  • Make AcceptRange only produce valid HTTP status codes. Those are in the 100..1000 range, whereas right now an unbounded AcceptRange starts at 0 if no left bound is specified, and goes up to u16::MAX if no right bound is specified.
  • Make the conversion from StatusCodeSelector to HashSet<StatusCode> exclude the u16 produced by the conversion to HashSet<u16> that do not belong to the valid HTTP status code range.
  • Don't use a HashSet<StatusCode> internally, juste manipulate the ranges directly. For a typical small amount of ranges, this should actually be faster (two integer comparisons per range vs a full hash map lookup).

HadrienG2 avatar Jul 17 '25 18:07 HadrienG2

@HadrienG2 Thanks for the analysis. I might try to fix it if I find the time 👍

thomas-zahner avatar Jul 18 '25 06:07 thomas-zahner

Make AcceptRange only produce valid HTTP status codes.

My vote goes to this. If we make invalid state unrepresentable, we can avoid it propagating across the codebase. We tackle the issue at the root.

mre avatar Nov 18 '25 11:11 mre