cabal icon indicating copy to clipboard operation
cabal copied to clipboard

Use a local secure repositories in the test-suite.

Open andreabedini opened this issue 1 year ago • 17 comments

The test-suite prelude provides the function withRemoteRepo to set up a secure remote repository using hackage-repo-tool. The created repository is then served using python3 internal http server.

Creating the http server and waiting for it to be ready turns out to be very fragile. Moreover what we really need in the test-suite it not to check any remote connectivity but to test features related to hackage-security and index-state. This can be more reliably achieved using a local secure repository (e.g. file://).

This change:

  • Renames withRemoteRepo to withSecureRepo, to better indicate the main difference from withRepo.
  • Configures the repository with a file:// url rather than http://
  • Removes the need for python3 and any retry logic.
  • At variance with withRepo, withSecureRepo should work on Windows.
  • Invokes hackage-repo-tool in such a way index timestamps can be made predictable and controllable from the test.

This solves some recent failures we have seen in CI.

Please read Github PR Conventions and then fill in one of these two templates.


Template Α: This PR modifies cabal behaviour

Include the following checklist in your PR:

  • [ ] Patches conform to the coding conventions.
  • [ ] Any changes that could be relevant to users have been recorded in the changelog.
  • [ ] The documentation has been updated, if necessary.
  • [ ] Manual QA notes have been included.
  • [ ] Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

Template Β: This PR does not modify cabal behaviour (documentation, tests, refactoring, etc.)

Include the following checklist in your PR:

andreabedini avatar Dec 19 '23 10:12 andreabedini

This sounds awesome! I’ll try to have a look this week. I wonder if this will bring any performance benefit…

ulysses4ever avatar Dec 19 '23 11:12 ulysses4ever

Is this a fix for #9530? I can't find any reference to that issue in the description. I guess we'll know when CI runs for a couple of days and the curl timeouts no longer show up?

Mikolaj avatar Dec 21 '23 12:12 Mikolaj

@andreabedini there's an issue with the new time dependency, which I tried to quickly fix (the branch is updated) but failed. The Data.Time.Format.ISO8601, which you use, only appeared in time-1.9 but we can't use "such new" version in the CI. The testsuite is build-type: Custom, and that's why (I think) you're limited in the selection of versions for boot libraries to whatever was shipped with GHC at the time. And we support pretty old GHCs... CUrrently, the oldest is GHC 8.4, which shipped time-1.8.0.2.

ulysses4ever avatar Dec 21 '23 20:12 ulysses4ever

The time issue is solved via time-compat (thanks, Andrea, for the idea).

But we have a new challenge now. Windows strikes back…

stderr:
*** Exception: Command "C:\Windows\SYSTEM32\tar.exe" "-czf" "D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "--force-local" "-C" "D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0" failed.
Output:
tar.exe: Option --force-local is not supported

ulysses4ever avatar Dec 25 '23 13:12 ulysses4ever

I tried to remove --force-local (https://github.com/haskell/cabal/pull/9540/commits/e9ea7178a21746928507549ecbd5f9505e241398) but now we get another error on Windows that doesn't seem to be related to tar...

-----BEGIN CABAL OUTPUT-----

Error: [Cabal-7085]

-----END CABAL OUTPUT-----

-----BEGIN CABAL OUTPUT-----

Error parsing config file D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\cabal.dist\home\.cabal\config:3:

Parse of field 'url' failed.

https://github.com/haskell/cabal/actions/runs/7322918701/job/19944966081?pr=9540

ulysses4ever avatar Dec 25 '23 19:12 ulysses4ever

I cannot reproduce the error on my MSYS2 because of this failing command:

+ "C:\msys64\usr\bin\tar.exe" "-czf" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "-C" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0"
tar (child): Cannot connect to C: resolve failed
/usr/bin/tar: Child returned status 128
/usr/bin/tar: Error is not recoverable: exiting now
update-index-state.test.hs: Command "C:\msys64\usr\bin\tar.exe" "-czf" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "-C" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0" failed.
Output:
tar (child): Cannot connect to C: resolve failed
/usr/bin/tar: Child returned status 128
/usr/bin/tar: Error is not recoverable: exiting now

Not sure what is going on here.

Note that:

cabal (e9ea717) [$] via λ lts-18.28
❯ tar -czf $(cygpath -u "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\cabal.dist\repo\package\pkg-1.0.tar.gz") -C $(cygpath -u "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\.\repo\\") pkg-1.0

cabal (e9ea717) [$?] via λ lts-18.28
➜ tar -czf C:\\Users\\Javier\\code\\cabal\\cabal-testsuite\\PackageTests\\NewUpdate\\RejectFutureIndexStates\\cabal.dist\\repo\\package\\pkg-1.0.tar.gz -C C:\\Users\\Javier\\code\\cabal\\cabal-testsuite\\PackageTests\\NewUpdate\\RejectFutureIndexStates\\.\\repo\\ pkg-1.0
tar (child): Cannot connect to C: resolve failed
tar: Child returned status 128
tar: Error is not recoverable: exiting now

So it seems GNU tar doesn't understand C:\\ paths. However sanitized *nix paths work properly. Perhaps we should account for this?

jasagredo avatar Dec 28 '23 00:12 jasagredo

What's happening there is that standard tar recognizes tarball names with a colon in them and treats them as hostname:filename, handing them off to rmt. That's what --force-local was disabling.

geekosaur avatar Dec 28 '23 00:12 geekosaur

I replaced my tar.exe with the Windows one for now (just to test this) and I found what is going on with the url:

repository test-local-repo
  url: file:C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo
  secure: True
  root-keys: ...

This is what is being written to the configuration. While uris to files in windows seem to use (ref https://en.wikipedia.org/wiki/File_URI_scheme):

repository test-local-repo
  url: file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
  secure: True
  root-keys: ...

Although that still doesn't work for me...

➜ cabal update --verbose
Including the following directories in PATH:
Project settings changed, reconfiguring...
Trying to locate mirrors via DNS for initial bootstrap of secure repository
'file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo'
...
Warning: Caught exception during _mirrors lookup:user error (DnsQuery_A failed
with 9003)
Warning: No mirrors found for
file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
fdOpen: does not exist (The system cannot find the path specified.)
❯ ls /c/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
01-index.tar  01-index.tar.gz  index  mirrors.json  package  root.json  snapshot.json  timestamp.json

Also if I ctrl+click that file:///... URI in the Windows terminal it opens a explorer.exe in the expected folder, where the contents do exist.

jasagredo avatar Dec 28 '23 01:12 jasagredo

So I think I have a veredict on what could be going wrong here. TL;DR: see below the horizontal rule at the end of this comment.

First of all, local secure repos are parsed in this chunk of code:

    withRepo _ callback | uriScheme remoteRepoURI == "file:" = do
      dir <- Sec.makeAbsolute $ Sec.fromFilePath (uriPath remoteRepoURI)

Now looking into those functions from Hackage.Security.Util.Paths, I see the following when playing with GHCi:

λ: u = "file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist"
λ: Just uri = parseURI nu
λ: makeAbsolute  $ fromFilePath $ Network.URI.uriPath uri
Path "C:\\/Users/Javier/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist"

So it is interpreting the path as relative (see the duplicated Users/Javier/Users/Javier). Why is this happening? I dug a bit more:

λ: unPathNative (Path fp) = FP.Native.joinPath . FP.Posix.splitDirectories $ fp
λ: (\(FsPath p) -> putStrLn $ unPathNative p) $ fromFilePath $ Network.URI.uriPath uri
C:Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist

So at this point it is already wrong.

λ: (\(FsPath p) -> FP.Native.isAbsolute  $ unPathNative p) $ fromFilePath $ Network.URI.uriPath uri
False

But this looks fine, except for the first /:

λ: (\(FsPath (Path p)) -> putStrLn p) $ fromFilePath $ Network.URI.uriPath uri
/C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist

Which can be fixed with a call to FP.Native.normalise:

λ: (\(FsPath (Path p)) -> putStrLn $ FP.Native.normalise  p) $ fromFilePath $ Network.URI.uriPath uri
C:/Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute $ FP.Native.normalise p) $ fromFilePath $ Network.URI.uriPath uri
True
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute p) $ fromFilePath $ Network.URI.uriPath uri
False

Note however that we should use the FP.Native.normalise, otherwise it doesn't work:

λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute $ FP.Posix.normalise  p) $ fromFilePath $ Network.URI.uriPath uri
False

I'm unsure on how to follow here. The code in hackage-security says explicitly:

newtype Path a = Path FilePath -- always a Posix style path internally

So I think the issue is more general. Filepaths in windows will have a preceding / that makes the whole logic break, in particular this line:

fromFilePath :: FilePath -> FsPath
fromFilePath fp
    | FP.Native.isAbsolute fp = FsPath (mkPathNative fp  :: Path Absolute)

will return False. How should we handle this? It seems all RFC3986 parsers prepend this / at the beginning (https://timothygu.me/urltester/): image

jasagredo avatar Dec 28 '23 09:12 jasagredo

Another possible outcome of this would be: "hackage-security and in turn local repositories do not work on Windows systems", which I think would be suboptimal.

jasagredo avatar Dec 28 '23 09:12 jasagredo

@gbaz, do you have thoughts on this issue related to hackage-security? (See just above.)

ulysses4ever avatar Dec 28 '23 11:12 ulysses4ever

I was pinged to comment here and I'm not sure I understand the issue.

In file://C:/foo/bar C: is parsed as the authority:

      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

We will lose :, because it's interpreted as an empty port:

      authority   = [ userinfo "@" ] host [ ":" port ]

      port        = *DIGIT

file:///C:/foo/bar is similar to file:/C:/foo/bar afaiu. In the first instance, // denotes a following authority component, but that is empty (indeed, the host component can be):

      authority   = [ userinfo "@" ] host [ ":" port ]

      host        = IP-literal / IPv4address / reg-name

      reg-name    = *( unreserved / pct-encoded / sub-delims )

Also see:

When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.

file:/C:/foo/bar on the other hand follows path-absolute afaiu, which must also start with /:

      path          = path-abempty    ; begins with "/" or is empty
                    / path-absolute   ; begins with "/" but not "//"
                    / path-noscheme   ; begins with a non-colon segment
                    / path-rootless   ; begins with a segment
                    / path-empty      ; zero characters

      path-abempty  = *( "/" segment )
      path-absolute = "/" [ segment-nz *( "/" segment ) ]
      path-noscheme = segment-nz-nc *( "/" segment )
      path-rootless = segment-nz *( "/" segment )
      path-empty    = 0<pchar>


      segment       = *pchar
      segment-nz    = 1*pchar
      segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                    ; non-zero-length segment without any colon ":"

      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

file:C:/foo/bar is likely also valid and parses under path-rootless.

The meaning of all these paths depends on the scheme, which is file here. RFC 3986 does not specify semantics or additional restrictions of the schemes.

URI schemes are registered with the IANA: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

The relevant RFC for file is 8089.

The ABNF here is:

      file-URI       = file-scheme ":" file-hier-part

      file-scheme    = "file"

      file-hier-part = ( "//" auth-path )
                     / local-path

      auth-path      = [ file-auth ] path-absolute

      local-path     = path-absolute

      file-auth      = "localhost"
                     / host

From what I understand here, all paths are considered "absolute":

The path component represents the absolute path to the file in the file system. See Appendix D for some discussion of system-specific concerns including absolute file paths and file system roots.

There is an appendix for windows/DOS paths: https://www.rfc-editor.org/rfc/rfc8089.html#appendix-E.2

So the updated ABNF is

      local-path     = [ drive-letter ] path-absolute

      drive-letter   = ALPHA ":"

And the spec gives two examples of absolute windows filepaths:

file:c:/path/to/file

file:///c:/path/to/file

These are equivalent. It doesn't say so, but I believe the third form is file:/c:/path/to/file as described above.

So the spec clearly shows what the semantics are for windows filepaths encoded in URI. I'm not sure what your question is. The filepath package does not deal with URIs. All paths must be assumed to be absolute according to the spec, so you could just normalize:

  • drop leading slashes and try to parse as absolute windows path

How you distinguish between windows and unix, I don't really know.

Edit: yes, leading slash is a relative path on windows.

hasufell avatar Dec 29 '23 07:12 hasufell

@jasagredo wow, thanks for the great invetigation. I will need some time to understand what to do.

andreabedini avatar Jan 17 '24 06:01 andreabedini

@jasagredo You have done many experiments with file:/// but I am unsure what happens with

repository test-local-repo
  url: file:C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo
  secure: True
  root-keys: ...

as we are producing now.

This seems to be valid as per RFC 8089, and parsed correctly by network-uri (the path is C:\Users\...). So what is failing? If I understand correctly the failure you report in https://github.com/haskell/cabal/pull/9540#issuecomment-1870717960 is related to gnutar?

andreabedini avatar Jan 19 '24 10:01 andreabedini

This seems to be valid as per RFC 8089

No, backslashes are not permitted as per RFC 3986 and neither per RFC 8089. Quoting the latter:

Historically, some usages have copied entire file paths into the path components of file URIs. Where DOS or Windows file paths were thus copied, the resulting URI strings contained unencoded backslash "" characters, which are forbidden by both [RFC1738] and [RFC3986].

It may be possible to translate or update such an invalid file URI by replacing all backslashes "" with slashes "/" if it can be determined with reasonable certainty that the backslashes are intended as path separators.

None of the proposed ABNFs in RFC 8089 include backslashes. So you'll have to go with a normalization function before attempting proper parsing.

hasufell avatar Jan 19 '24 11:01 hasufell

I've written a library that parses correctly, with the spec-deviation that when requesting to parse windows paths it would reject posix paths (otherwise it's hard to disambiguate and the ABNF is actually ambiguous): https://hackage.haskell.org/package/file-uri-0.1.0.0/docs/System-URI-File.html#v:parseFileURI

This means when you get a windows path, it either starts with a drive letter or is a UNC path.

hasufell avatar Jan 20 '24 06:01 hasufell