taplo icon indicating copy to clipboard operation
taplo copied to clipboard

Handle Windows paths

Open tamasfe opened this issue 3 years ago • 19 comments

We need to account for windows path quirks:

  • decode all percent-encoded URLs in the LSP
  • use / as a path separator so that glob patterns defined in config files work as expected

Follow-up from #280.

tamasfe avatar Jul 17 '22 15:07 tamasfe

I am coding in Rust, but it sounds like this issue is in the JS side (which I don't really know, nor TypeScript).

Can you guide me/point me where to look for the paths issues I can look into it.

Any idea on which side it is ?

GilShoshan94 avatar Jul 17 '22 16:07 GilShoshan94

It is in Rust, fortunately the url library supports parsing and storing paths that are not URL-encoded, so we can simply transform all URLs in the c:/some/nested/directory form, the issue is that file paths and URLs are pretty much scattered everywhere.

We'll probably have to transform all paths returned by the glob_files function, and all document and workspace URLs that are passed by the language client in the LSP handlers.

Alternatively we can just transform the paths only when we actually use them (e.g. reading the config file), which is slightly less work but still tedious and probably won't solve all cases.

Both options seem tedious and error-prone, but I don't have better ideas right now.

tamasfe avatar Jul 17 '22 16:07 tamasfe

Hi,

I looked into glob_files and found 2 implementations, one in native.rs from taplo-common crate and one in environement.rs from taplo-wasm crate

I tried the implementation from the native one on my system (windows) in my working directory, here is the code in main.rs:

use std::fs;

fn glob_files(pattern: &str) -> Result<Vec<std::path::PathBuf>, anyhow::Error> {
    let paths = glob::glob_with(
        pattern,
        glob::MatchOptions { case_sensitive: true, ..Default::default() },
    )?;
    Ok(paths.filter_map(Result::ok).collect())
}

fn main() {
    println!("Hello, world!");
    let res = glob_files("**/*.toml").unwrap();
    println!("{:?}", res);
    println!("---");
    for path in res {
        println!("{}", fs::canonicalize(&path).unwrap().display());
        println!("{}", dunce::canonicalize(&path).unwrap().display());
        println!("---");
    }
}

And here is the printed result when I ran it:

Hello, world!
["Cargo.toml", "rustfmt.toml", "taplo.toml"]
---
\\?\C:\Users\PC-Gil\Documents\Work\RustWorkspace\nova\Cargo.toml  
C:\Users\PC-Gil\Documents\Work\RustWorkspace\nova\Cargo.toml      
---
\\?\C:\Users\PC-Gil\Documents\Work\RustWorkspace\nova\rustfmt.toml
C:\Users\PC-Gil\Documents\Work\RustWorkspace\nova\rustfmt.toml    
---
\\?\C:\Users\PC-Gil\Documents\Work\RustWorkspace\nova\taplo.toml  
C:\Users\PC-Gil\Documents\Work\RustWorkspace\nova\taplo.toml      
---

So it seems that the native glob_files function works fine. I don't see where the weird "/c%3A/" comes from yet.

When using fs::canonicalize, on Windows you get an Windows NT UNC paths (\\?\C:\foo), I think that even if ugly, it should work but otherwise there is this crate called dunce that replace canonicalize to de UNC (here the name) the path in Windows and stay the same on the others platform.

You can read more about it here:

  • https://github.com/rust-lang/rust/issues/42869
  • https://doc.rust-lang.org/std/fs/fn.canonicalize.html#:~:text=On%20Windows%2C%20this,application%20may%20read).

GilShoshan94 avatar Jul 18 '22 08:07 GilShoshan94

I looked more around that you set up the config path the world.rs from taplo-lsp crate. I see that in the struct WorkspaceState you are holding the root as an Url struct from the lsp_types crate.

It's written in its documentation for the from_file_path method:

This returns Err if the given path is not absolute or, on Windows, if the prefix is not a disk prefix (e.g. C:) or a UNC prefix (\).

I don't know yet where in the code it goes wrong. But I am more and more convinced that glob_files is finem that the paths are handled correctly, even if there are the UNC version in Windows, otherwise, taplo would not find any file to format and wouldn't work I think.

It seems to me that the bug is just in this world.rs and the way you compose the path. I couldn't find a single place where you use the Url' method from_file_path nor from_directory_path.

I see that you compose the Url with a join.

I did a test a gain on my machine with this Url struct to test it:

use std::fs;

use lsp_types::Url;

fn glob_files(pattern: &str) -> Result<Vec<std::path::PathBuf>, anyhow::Error> {
    let paths = glob::glob_with(
        pattern,
        glob::MatchOptions { case_sensitive: true, ..Default::default() },
    )?;
    Ok(paths.filter_map(Result::ok).collect())
}

fn main() {
    println!("Hello, world!");
    let res = glob_files("**/*.toml").unwrap();
    println!("{:?}", res);
    println!("---");
    for path in res {
        // println!("{}", fs::canonicalize(&path).unwrap().display());
        // println!("{}", dunce::canonicalize(&path).unwrap().display());
        let absolute_path = fs::canonicalize(&path).unwrap();
        let p = Url::from_file_path(absolute_path).unwrap();
        println!("{:?}", p);
        let p2 = p.to_file_path().unwrap();
        println!("{:?}", p2);
        println!("---");
    }
}

And the result works

Hello, world!
["Cargo.toml", "rustfmt.toml", "taplo.toml"]
---
Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/C:/Users/PC-Gil/Documents/Work/RustWorkspace/nova/Cargo.toml", query: None, fragment: None }  
"C:\\Users\\PC-Gil\\Documents\\Work\\RustWorkspace\\nova\\Cargo.toml"
---
Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/C:/Users/PC-Gil/Documents/Work/RustWorkspace/nova/rustfmt.toml", query: None, fragment: None }
"C:\\Users\\PC-Gil\\Documents\\Work\\RustWorkspace\\nova\\rustfmt.toml"
---
Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/C:/Users/PC-Gil/Documents/Work/RustWorkspace/nova/taplo.toml", query: None, fragment: None }  
"C:\\Users\\PC-Gil\\Documents\\Work\\RustWorkspace\\nova\\taplo.toml"
---

GilShoshan94 avatar Jul 18 '22 09:07 GilShoshan94

So it seems that the native glob_files function works fine. I don't see where the weird "/c%3A/" comes from yet.

I forgot to mention, maybe it comes from the other implementation in taplo-wasm ??? are you even using it for the desktop extension ?

GilShoshan94 avatar Jul 18 '22 09:07 GilShoshan94

I just found out something else interesting. To get the rule to work, the only way to do it now is with an extension file (until you add the option to set it up in settings.json #281).

So I had in settings.json "evenBetterToml.taplo.configFile.path" set to the explicit path ("C:/Users/PC-Gil/.../nova/taplo.toml") and it works. OUTPUT:

 INFO update_configuration:initialize: using config file path="C:/Users/PC-Gil/Documents/Work/RustWorkspace/nova/taplo.toml" self.root=file:///c%3A/Users/PC-Gil/Documents/Work/RustWorkspace/nova

But I wanted to make it a bit more general for my other projects and to take advantage of VsCode Variables Reference and set the path to "${workspaceFolder}/taplo.toml", so it would use the taplo.toml file from each project. It does not work.

And this is the OUTPUT I can see:

 INFO update_configuration:initialize: using config file path="/c%3A/Users/PC-Gil/Documents/Work/RustWorkspace/nova/${workspaceFolder}/taplo.toml" self.root=file:///c%3A/Users/PC-Gil/Documents/Work/RustWorkspace/nova
 WARN update_configuration:initialize: failed to load workspace configuration error=JsValue(Error: ENOENT: no such file or directory, open 'c:\c%3A\Users\PC-Gil\Documents\Work\RustWorkspace\nova\${workspaceFolder}\taplo.toml'
	Error: ENOENT: no such file or directory, open 'c:\c%3A\Users\PC-Gil\Documents\Work\RustWorkspace\nova\${workspaceFolder}\taplo.toml') self.root=file:///c%3A/Users/PC-Gil/Documents/Work/RustWorkspace/nova

Hope it makes sense to you.

GilShoshan94 avatar Jul 18 '22 09:07 GilShoshan94

Thank you again. Sorry, I probably should've included more details in the OP to spare you this investigation.

The current unhandled issues with paths:

  • URIs and (and it seems like all paths) that are given by the vscode language client are all percent-encoded, that's where C%3A comes from, we need to decode pretty much everything that is passed by the language client via handlers or settings.
  • We need to use forward slashes (/) as separators due to the patterns in configuration files. Forward slashes are a requirement due to portability (and \\ is generally ugly anyway).
  • The above is the reason why we simply cannot use the paths from glob_files as-is, not that they are invalid, but they won't match any patterns.
  • We cannot use fs::canonicalize because it requires access to the OS that we don't have in WASM, but it also keeps separators as \.
  • Another thing to keep in mind is that currently we use a simple prefix check to see whether a document is in a given workspace, so whatever we do to document URIs, we must do to workspace URIs as well.

tamasfe avatar Jul 18 '22 14:07 tamasfe

I don't understand all what you wrote to be honest.

But if you are using Url from lsp-types, the path method does return a percent-encoded ASCII string with forward slashes (/) as separators.

I tried it and this is what it prints: "/C:/Users/PC-Gil/Documents/Work/RustWorkspace/nova/Cargo.toml"

GilShoshan94 avatar Jul 18 '22 14:07 GilShoshan94

I decided to keep most URIs as-is, we receive a lot of URIs from the language client, some of them are even returned to it at some point. They are also percent encoded for a reason, so it's probably the wisest decision is to just let them be.

Instead we decode the percent-encoded characters when we work with the URIs as strings or paths, we additionally replace all \s with /.

We generally need to do this when we:

  • match glob or regex rules for schema associations
  • read files
  • match include/exclude or rule patterns in the config

I don't understand all what you wrote to be honest.

If you elaborate, I can probably help.

But if you are using Url from lsp-types, the path method does return a percent-encoded ASCII string with forward slashes (/) as separators.

As you mentioned, it's still percent encoded so the URI file://c%3A/directory%20with%20spaces/foo.toml won't match the glob pattern **/directory with spaces/foo.toml, it's also not a valid file path.

tamasfe avatar Jul 18 '22 18:07 tamasfe

As you mentioned, it's still percent encoded so the URI file://c%3A/directory%20with%20spaces/foo.toml won't match the glob pattern **/directory with spaces/foo.toml, it's also not a valid file path.

When I tried it, I still got /C:/.../... and no weird /c%3A/.../.... I don't really understand where it comes from....

Anyway, if you fixed it it's great! Thank you.

I still have a lot to learn XD. I was happy to look around the code and try to help.

GilShoshan94 avatar Jul 18 '22 21:07 GilShoshan94

You did help a lot, thanks! I hadn't realized that all things discussed were unusable on Windows before.

I don't really understand where it comes from....

There are two parts to the LSP architecture, the server and the client, and we only control the server.

Every time a workspace or TOML document is opened, the language client (vscode) gives its URI to the language server (taplo-lsp). Every URI in vscode is percent-encoded, so that's where they come from. If you put an eprintln!("{}", p.text_document.uri) or tracing::debug here, you can observe the URI that is passed from vscode is in fact percent-encoded.

The url library in Rust seems to happily accept both percent-encoded and "regular" URIs, so if you just do Url::parse("file://c:/foo.toml"), it will leave it unaltered.

tamasfe avatar Jul 18 '22 22:07 tamasfe

Is #286 shipped in the v0.17.1 version of the VSCode extension? Because I still have a problems.

Bludator avatar Aug 10 '22 14:08 Bludator

Ok, so the absolute paths works fine, but the relative ones does not. When I specify VSCode settings.json:

{
	"evenBetterToml.taplo.configFile.enabled": true,
	"evenBetterToml.taplo.configFile.path": "taplo.toml",
	"evenBetterToml.taplo.environment": {
		"RUST_LOG": "info,taplo=debug"
	 }
}

it does not find the config, in logs I get :

DEBUG update_configuration:initialize: using config file at specified path path="taplo.toml" self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
 INFO update_configuration:initialize: using config file path="/c:/Users/bludator/codes/-Temp/taplo-test/taplo.toml" self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
 WARN update_configuration:initialize: failed to load workspace configuration error=JsValue(Error: ENOENT: no such file or directory, open 'c:\c:\Users\bludator\codes\-Temp\taplo-test\taplo.toml'
Error: ENOENT: no such file or directory, open 'c:\c:\Users\bludator\codes\-Temp\taplo-test\taplo.toml') self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
DEBUG update_configuration:initialize: using config file at specified path path="taplo.toml" self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
 INFO update_configuration:initialize: using config file path="/c:/Users/bludator/codes/-Temp/taplo-test/taplo.toml" self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
 WARN update_configuration:initialize: failed to load workspace configuration error=JsValue(Error: ENOENT: no such file or directory, open 'c:\c:\Users\bludator\codes\-Temp\taplo-test\taplo.toml'
Error: ENOENT: no such file or directory, open 'c:\c:\Users\bludator\codes\-Temp\taplo-test\taplo.toml') self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test

The c:\c:\Users\bludator\codes\-Temp\taplo-test\taplo.toml is clearly wrong, and the slash in the /c:/Users/bludator/codes/-Temp/taplo-test/taplo.toml is rather suspicious.

When I use relative paths to assign schema (as I have to) it gives similar/same errors.

Also the extension does not find taplo.toml or .taplo.toml in the workspace root which is likely the same issue.


My setup:

Windows 11
VSCode 1.70.1
taplo v0.17.1

Bludator avatar Aug 12 '22 17:08 Bludator

I haven't tested this on windows, but had the same issue in https://github.com/rhaiscript/lsp/issues/86#issuecomment-1228715262, applied the same "fix" here, so it should work now.

tamasfe avatar Sep 01 '22 21:09 tamasfe

It works, the config is now found (both in the default location and with explicitly specified path in settings.json). Similarly I can set schema with path.

But I found other windows-only issues:

  • schema directive does not work
  • the selection of schema in bottom status bar in VSCode does nothing too

Bludator avatar Sep 21 '22 13:09 Bludator

I have discovered that the same issues occur on linux when I have a non-existing file/path in evenBetterToml.taplo.configFile.path in settings.json.

...I thought it could help with debugging.

Bludator avatar Sep 23 '22 10:09 Bludator

The fix from #321 does not work on Windows machines.

My files:

I have test.toml with its schema.json and test2.toml with its schema2.json.

In taplo.toml I set:

#:schema taplo://taplo.toml

[schema]
path = "schema.json"

[[rule]]
include = ["test2.toml"]
schema.path = "schema2.json"
Here are some (hopefully) relevant logs:
DEBUG update_configuration:initialize: discovering config file in workspace self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
 INFO update_configuration:initialize: using config file path="c:\\Users\\bludator\\codes\\-Temp\\taplo-test\\taplo.toml" self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
DEBUG update_configuration:initialize: using config: Config {
    include: None,
    exclude: None,
    rule: [
        Rule {
            name: None,
            include: Some(
                [
                    "c:/Users/bludator/codes/-Temp/taplo-test/test2.toml",
                ],
            ),
            exclude: None,
            keys: None,
            options: Options {
                schema: Some(
                    SchemaOptions {
                        enabled: None,
                        path: None,
                        url: Some(
                            Url {
                                scheme: "file",
                                cannot_be_a_base: false,
                                username: "",
                                password: None,
                                host: None,
                                port: None,
                                path: "/c:/Users/bludator/codes/-Temp/taplo-test/schema2.json",
                                query: None,
                                fragment: None,
                            },
                        ),
                    },
                ),
                formatting: None,
            },
        },
    ],
    global_options: Options {
        schema: Some(
            SchemaOptions {
                enabled: None,
                path: None,
                url: Some(
                    Url {
                        scheme: "file",
                        cannot_be_a_base: false,
                        username: "",
                        password: None,
                        host: None,
                        port: None,
                        path: "/c:/Users/bludator/codes/-Temp/taplo-test/schema.json",
                        query: None,
                        fragment: None,
                    },
                ),
            },
        ),
        formatting: None,
    },
} self.root=file:///c%3A/Users/bludator/codes/-Temp/taplo-test
DEBUG document_open: found schema association schema.url=taplo://taplo.toml schema.name="" schema.source="directive"
DEBUG document_open:publish_diagnostics:collect_schema_errors: found schema association schema.url=file:///c:/Users/bludator/codes/-Temp/taplo-test/schema.json schema.name="" schema.source="config" document_url=file:///c%3A/Users/bludator/codes/-Temp/taplo-test/test2.toml
DEBUG document_open:publish_diagnostics:collect_schema_errors: using schema schema.url=file:///c:/Users/bludator/codes/-Temp/taplo-test/schema.json schema.name="" schema.source="config" document_url=file:///c%3A/Users/bludator/codes/-Temp/taplo-test/test2.toml
DEBUG document_open:publish_diagnostics:collect_schema_errors: found schema association schema.url=taplo://taplo.toml schema.name="" schema.source="directive" document_url=file:///c%3A/Users/bludator/codes/-Temp/taplo-test/taplo.toml
DEBUG document_open:publish_diagnostics:collect_schema_errors: using schema schema.url=taplo://taplo.toml schema.name="" schema.source="directive" document_url=file:///c%3A/Users/bludator/codes/-Temp/taplo-test/taplo.toml
DEBUG document_open:publish_diagnostics:collect_schema_errors:validate_root:validate:load_schema: schema was found in cache schema_url=file:///c:/Users/bludator/codes/-Temp/taplo-test/schema.json document_url=file:///c%3A/Users/bludator/codes/-Temp/taplo-test/test2.toml schema_url=file:///c:/Users/bludator/codes/-Temp/taplo-test/schema.json schema_url=file:///c:/Users/bludator/codes/-Temp/taplo-test/schema.json schema_url=file:///c:/Users/bludator/codes/-Temp/taplo-test/schema.json
DEBUG document_open:publish_diagnostics:collect_schema_errors:validate_root:validate:load_schema: schema was found in cache schema_url=taplo://taplo.toml document_url=file:///c%3A/Users/bludator/codes/-Temp/taplo-test/taplo.toml schema_url=taplo://taplo.toml schema_url=taplo://taplo.toml schema_url=taplo://taplo.toml

Bludator avatar Oct 26 '22 17:10 Bludator

Thanks, the logs are indeed helpful. Looks like the paths are still not uniform that needs to be looked into.

tamasfe avatar Oct 26 '22 21:10 tamasfe

This cfg!(windows) would be false for wasm, so when running a wasm build on Windows, the paths don't get normalized. https://github.com/tamasfe/taplo/blob/f7c5279af987c2ef02c8931618295bb3e5b52dec/crates/taplo-common/src/util.rs#L117-L121 I guess it needs to be a runtime variable that is passed from the node side (os.platform() !== 'win32') instead of a compile time constant.

sapphi-red avatar Apr 14 '24 13:04 sapphi-red