cabal
cabal copied to clipboard
Use a local secure repositories in the test-suite.
The test-suite prelude provides the function withRemoteRepo to set up
a secure remote repository using hackage-repo-tool. The created
repository is then served using python3 internal http server.
Creating the http server and waiting for it to be ready turns out to be very fragile. Moreover what we really need in the test-suite it not to check any remote connectivity but to test features related to hackage-security and index-state. This can be more reliably achieved using a local secure repository (e.g. file://).
This change:
- Renames withRemoteRepo to withSecureRepo, to better indicate the main difference from withRepo.
- Configures the repository with a file:// url rather than http://
- Removes the need for python3 and any retry logic.
- At variance with withRepo, withSecureRepo should work on Windows.
- Invokes hackage-repo-tool in such a way index timestamps can be made predictable and controllable from the test.
This solves some recent failures we have seen in CI.
Please read Github PR Conventions and then fill in one of these two templates.
Template Α: This PR modifies cabal behaviour
Include the following checklist in your PR:
- [ ] Patches conform to the coding conventions.
- [ ] Any changes that could be relevant to users have been recorded in the changelog.
- [ ] The documentation has been updated, if necessary.
- [ ] Manual QA notes have been included.
- [ ] Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)
Template Β: This PR does not modify cabal behaviour (documentation, tests, refactoring, etc.)
Include the following checklist in your PR:
- [ ] Patches conform to the coding conventions.
This sounds awesome! I’ll try to have a look this week. I wonder if this will bring any performance benefit…
Is this a fix for #9530? I can't find any reference to that issue in the description. I guess we'll know when CI runs for a couple of days and the curl timeouts no longer show up?
@andreabedini there's an issue with the new time dependency, which I tried to quickly fix (the branch is updated) but failed. The Data.Time.Format.ISO8601, which you use, only appeared in time-1.9 but we can't use "such new" version in the CI. The testsuite is build-type: Custom, and that's why (I think) you're limited in the selection of versions for boot libraries to whatever was shipped with GHC at the time. And we support pretty old GHCs... CUrrently, the oldest is GHC 8.4, which shipped time-1.8.0.2.
The time issue is solved via time-compat (thanks, Andrea, for the idea).
But we have a new challenge now. Windows strikes back…
stderr:
*** Exception: Command "C:\Windows\SYSTEM32\tar.exe" "-czf" "D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "--force-local" "-C" "D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0" failed.
Output:
tar.exe: Option --force-local is not supported
I tried to remove --force-local (https://github.com/haskell/cabal/pull/9540/commits/e9ea7178a21746928507549ecbd5f9505e241398) but now we get another error on Windows that doesn't seem to be related to tar...
-----BEGIN CABAL OUTPUT-----
Error: [Cabal-7085]
-----END CABAL OUTPUT-----
-----BEGIN CABAL OUTPUT-----
Error parsing config file D:\a\cabal\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\cabal.dist\home\.cabal\config:3:
Parse of field 'url' failed.
https://github.com/haskell/cabal/actions/runs/7322918701/job/19944966081?pr=9540
I cannot reproduce the error on my MSYS2 because of this failing command:
+ "C:\msys64\usr\bin\tar.exe" "-czf" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "-C" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0"
tar (child): Cannot connect to C: resolve failed
/usr/bin/tar: Child returned status 128
/usr/bin/tar: Error is not recoverable: exiting now
update-index-state.test.hs: Command "C:\msys64\usr\bin\tar.exe" "-czf" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo\package\pkg-1.0.tar.gz" "-C" "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\.\repo\\" "pkg-1.0" failed.
Output:
tar (child): Cannot connect to C: resolve failed
/usr/bin/tar: Child returned status 128
/usr/bin/tar: Error is not recoverable: exiting now
Not sure what is going on here.
Note that:
cabal (e9ea717) [$] via λ lts-18.28
❯ tar -czf $(cygpath -u "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\cabal.dist\repo\package\pkg-1.0.tar.gz") -C $(cygpath -u "C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\RejectFutureIndexStates\.\repo\\") pkg-1.0
cabal (e9ea717) [$?] via λ lts-18.28
➜ tar -czf C:\\Users\\Javier\\code\\cabal\\cabal-testsuite\\PackageTests\\NewUpdate\\RejectFutureIndexStates\\cabal.dist\\repo\\package\\pkg-1.0.tar.gz -C C:\\Users\\Javier\\code\\cabal\\cabal-testsuite\\PackageTests\\NewUpdate\\RejectFutureIndexStates\\.\\repo\\ pkg-1.0
tar (child): Cannot connect to C: resolve failed
tar: Child returned status 128
tar: Error is not recoverable: exiting now
So it seems GNU tar doesn't understand C:\\ paths. However sanitized *nix paths work properly. Perhaps we should account for this?
What's happening there is that standard tar recognizes tarball names with a colon in them and treats them as hostname:filename, handing them off to rmt. That's what --force-local was disabling.
I replaced my tar.exe with the Windows one for now (just to test this) and I found what is going on with the url:
repository test-local-repo
url: file:C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo
secure: True
root-keys: ...
This is what is being written to the configuration. While uris to files in windows seem to use (ref https://en.wikipedia.org/wiki/File_URI_scheme):
repository test-local-repo
url: file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
secure: True
root-keys: ...
Although that still doesn't work for me...
➜ cabal update --verbose
Including the following directories in PATH:
Project settings changed, reconfiguring...
Trying to locate mirrors via DNS for initial bootstrap of secure repository
'file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo'
...
Warning: Caught exception during _mirrors lookup:user error (DnsQuery_A failed
with 9003)
Warning: No mirrors found for
file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
fdOpen: does not exist (The system cannot find the path specified.)
❯ ls /c/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist/repo
01-index.tar 01-index.tar.gz index mirrors.json package root.json snapshot.json timestamp.json
Also if I ctrl+click that file:///... URI in the Windows terminal it opens a explorer.exe in the expected folder, where the contents do exist.
So I think I have a veredict on what could be going wrong here. TL;DR: see below the horizontal rule at the end of this comment.
First of all, local secure repos are parsed in this chunk of code:
withRepo _ callback | uriScheme remoteRepoURI == "file:" = do
dir <- Sec.makeAbsolute $ Sec.fromFilePath (uriPath remoteRepoURI)
Now looking into those functions from Hackage.Security.Util.Paths, I see the following when playing with GHCi:
λ: u = "file:///C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist"
λ: Just uri = parseURI nu
λ: makeAbsolute $ fromFilePath $ Network.URI.uriPath uri
Path "C:\\/Users/Javier/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist"
So it is interpreting the path as relative (see the duplicated Users/Javier/Users/Javier). Why is this happening? I dug a bit more:
λ: unPathNative (Path fp) = FP.Native.joinPath . FP.Posix.splitDirectories $ fp
λ: (\(FsPath p) -> putStrLn $ unPathNative p) $ fromFilePath $ Network.URI.uriPath uri
C:Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist
So at this point it is already wrong.
λ: (\(FsPath p) -> FP.Native.isAbsolute $ unPathNative p) $ fromFilePath $ Network.URI.uriPath uri
False
But this looks fine, except for the first /:
λ: (\(FsPath (Path p)) -> putStrLn p) $ fromFilePath $ Network.URI.uriPath uri
/C:/Users/Javier/code/cabal/cabal-testsuite/PackageTests/NewUpdate/UpdateIndexState/update-index-state.dist
Which can be fixed with a call to FP.Native.normalise:
λ: (\(FsPath (Path p)) -> putStrLn $ FP.Native.normalise p) $ fromFilePath $ Network.URI.uriPath uri
C:/Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute $ FP.Native.normalise p) $ fromFilePath $ Network.URI.uriPath uri
True
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute p) $ fromFilePath $ Network.URI.uriPath uri
False
Note however that we should use the FP.Native.normalise, otherwise it doesn't work:
λ: (\(FsPath (Path p)) -> FP.Native.isAbsolute $ FP.Posix.normalise p) $ fromFilePath $ Network.URI.uriPath uri
False
I'm unsure on how to follow here. The code in hackage-security says explicitly:
newtype Path a = Path FilePath -- always a Posix style path internally
So I think the issue is more general. Filepaths in windows will have a preceding / that makes the whole logic break, in particular this line:
fromFilePath :: FilePath -> FsPath
fromFilePath fp
| FP.Native.isAbsolute fp = FsPath (mkPathNative fp :: Path Absolute)
will return False. How should we handle this? It seems all RFC3986 parsers prepend this / at the beginning (https://timothygu.me/urltester/):
Another possible outcome of this would be: "hackage-security and in turn local repositories do not work on Windows systems", which I think would be suboptimal.
@gbaz, do you have thoughts on this issue related to hackage-security? (See just above.)
I was pinged to comment here and I'm not sure I understand the issue.
In file://C:/foo/bar C: is parsed as the authority:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
We will lose :, because it's interpreted as an empty port:
authority = [ userinfo "@" ] host [ ":" port ]
port = *DIGIT
file:///C:/foo/bar is similar to file:/C:/foo/bar afaiu. In the first instance, // denotes a following authority component, but that is empty (indeed, the host component can be):
authority = [ userinfo "@" ] host [ ":" port ]
host = IP-literal / IPv4address / reg-name
reg-name = *( unreserved / pct-encoded / sub-delims )
Also see:
When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.
file:/C:/foo/bar on the other hand follows path-absolute afaiu, which must also start with /:
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
; non-zero-length segment without any colon ":"
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
file:C:/foo/bar is likely also valid and parses under path-rootless.
The meaning of all these paths depends on the scheme, which is file here. RFC 3986 does not specify semantics or additional restrictions of the schemes.
URI schemes are registered with the IANA: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
The relevant RFC for file is 8089.
The ABNF here is:
file-URI = file-scheme ":" file-hier-part
file-scheme = "file"
file-hier-part = ( "//" auth-path )
/ local-path
auth-path = [ file-auth ] path-absolute
local-path = path-absolute
file-auth = "localhost"
/ host
From what I understand here, all paths are considered "absolute":
The path component represents the absolute path to the file in the file system. See Appendix D for some discussion of system-specific concerns including absolute file paths and file system roots.
There is an appendix for windows/DOS paths: https://www.rfc-editor.org/rfc/rfc8089.html#appendix-E.2
So the updated ABNF is
local-path = [ drive-letter ] path-absolute
drive-letter = ALPHA ":"
And the spec gives two examples of absolute windows filepaths:
file:c:/path/to/file
file:///c:/path/to/file
These are equivalent. It doesn't say so, but I believe the third form is file:/c:/path/to/file as described above.
So the spec clearly shows what the semantics are for windows filepaths encoded in URI. I'm not sure what your question is. The filepath package does not deal with URIs. All paths must be assumed to be absolute according to the spec, so you could just normalize:
- drop leading slashes and try to parse as absolute windows path
How you distinguish between windows and unix, I don't really know.
Edit: yes, leading slash is a relative path on windows.
@jasagredo wow, thanks for the great invetigation. I will need some time to understand what to do.
@jasagredo You have done many experiments with file:/// but I am unsure what happens with
repository test-local-repo
url: file:C:\Users\Javier\code\cabal\cabal-testsuite\PackageTests\NewUpdate\UpdateIndexState\update-index-state.dist\repo
secure: True
root-keys: ...
as we are producing now.
This seems to be valid as per RFC 8089, and parsed correctly by network-uri (the path is C:\Users\...). So what is failing? If I understand correctly the failure you report in https://github.com/haskell/cabal/pull/9540#issuecomment-1870717960 is related to gnutar?
This seems to be valid as per RFC 8089
No, backslashes are not permitted as per RFC 3986 and neither per RFC 8089. Quoting the latter:
Historically, some usages have copied entire file paths into the path components of file URIs. Where DOS or Windows file paths were thus copied, the resulting URI strings contained unencoded backslash "" characters, which are forbidden by both [RFC1738] and [RFC3986].
It may be possible to translate or update such an invalid file URI by replacing all backslashes "" with slashes "/" if it can be determined with reasonable certainty that the backslashes are intended as path separators.
None of the proposed ABNFs in RFC 8089 include backslashes. So you'll have to go with a normalization function before attempting proper parsing.
I've written a library that parses correctly, with the spec-deviation that when requesting to parse windows paths it would reject posix paths (otherwise it's hard to disambiguate and the ABNF is actually ambiguous): https://hackage.haskell.org/package/file-uri-0.1.0.0/docs/System-URI-File.html#v:parseFileURI
This means when you get a windows path, it either starts with a drive letter or is a UNC path.