test-generator icon indicating copy to clipboard operation
test-generator copied to clipboard

Filenames with plus signs

Open koivunej opened this issue 5 years ago • 2 comments

Expected: example filenames with + to work.

Actual: test_generator::test_resources panicked with message: "some_fn_name" is not a valid identifier

Looks like the issue is at https://github.com/frehberg/test-generator/blob/ce45b1a712ab7454bec7e0c9d04dbf2417f6d237/test-generator/src/lib.rs#L131-L138 but I am not really sure what would be the exhaustive list of characters needing to be translated to _. Might be that there are so many caveats that #5 is the way to go when defaults fail to work? Though in this case I'd just re-create the name mangling with this additional character, so not sure how good overall solution that'd be.

koivunej avatar Oct 26 '20 08:10 koivunej

Another idea: use raw identifiers for the test names.

EDIT: Sadly that wouldn't work. I am thinking a inverted regex matching the opposite of https://doc.rust-lang.org/reference/identifiers.html character classes might be the exhaustive solution.

koivunej avatar Oct 26 '20 08:10 koivunej

This is simple to implement:

 // Form canonical name without any punctuation/delimiter or special character
 fn canonical_fn_name(s: &str) -> String {
     // remove delimiters and special characters
-    s.replace(
-        &['"', ' ', '.', ':', '-', '*', '/', '\\', '\n', '\t', '\r'][..],
-        "_",
-    )
+    // essentially `s/[^A-Za-z0-9]/_//`, see https://doc.rust-lang.org/reference/identifiers.html
+    // we probably cannot dedup multiple subsequent '_' as there is a chance of collision here as
+    // well.
+
+    s.chars()
+        .map(|ch| if ch.is_ascii_alphanumeric() { ch } else { '_' })
+        .collect()
 }

Trying it out will however manifest another issue: many files are mapped to same fn names.

The current code needs to perform PathBuf -> String conversion in order to do the path to fn name mangling, and I am not sure if this is cross platform: https://github.com/frehberg/test-generator/blob/ce45b1a712ab7454bec7e0c9d04dbf2417f6d237/test-generator/src/lib.rs#L251-L255

If it was or could be made as such, creating a mapping from canonicalized_path_as_identifier -> Vec<PathBuf> would be sufficient to make unique function name per filename even if they end up having very much similar fn names with a numerical suffix for example.

I did not yet look but from the subjects I'd assume that this work would conflict with the open PRs by @svenfoo, so perhaps I'll revisit this later.

koivunej avatar Oct 26 '20 08:10 koivunej