typos
typos copied to clipboard
Hashes/encodings below the heuristic limit are treated as typos
error: `Ba` should be `By`, `Be`
--> ./content/blog/2017/08/2017-08-08-introducing-jenkins-minute.adoc:39:14
|
39 | video::FhDomw6BaHU[youtube, width=852, height=480]
I can add it to [default.extend-identifiers]
so its not a blocker but figured you'd like another test case.
The challenge is being able to identify that has a hash. How do we tell a hash from an identifier?
Right now, we support
- SHA detection: must be 32+ characters long and consistent case
- Base 64 detection: Must be 90 characters long or have
+
//
in it and must have the padding bytes (though there is uncertainty if the padding byte requirement will stay, see #413)
yea that makes a lot of sense. I'm just starting to use the app and loving it so far, I've only had to whitelist two hashes that have ba in them so its not a big deal for me.
Maybe some sort of regex or something so I could whitelist video::[a-zA-Z0-9]\[
I think it causes also some problems with jupyter notebooks.
error: `ba` should be `by`, `be`
--> jupyter.ipynb:661:11
|
661 | "id": "6ba7c279",
| ^^
|
error: `ba` should be `by`, `be`
--> jupyter.ipynb:784:15
|
784 | "id": "33088ba8",
| ^^
|
error: `ba` should be `by`, `be`
--> jupyter.ipynb:1029:10
|
1029 | "id": "ba6788ca",
| ^^
|
error: `ba` should be `by`, `be`
--> jupyter.ipynb:2029:10
|
2029 | "id": "ba638183",
| ^^
|
Hello,
Git commit hashes tend to run in the range [0-9a-fA-F]{7,}
so that would be a useful addition.
@tspearconquest for shorter git commit hashes, we'll need to rely on a heuristic like talked about in #484 because shorter commit hashes could just as easily be words.
How about adding a heuristic "word contains characters preceded by numbers" (where "word" is a whitespace-separated segment, not a case-separated segment)? I don't think I've ever seen an identitifer be named foo1bar
or 3foo
, though foo3
or foo3_bar
seem realistic (e.g. zip3
, zip4
).
sha1hash?
Right, there's a few exceptions (I also remembered there being 2to3
), but maybe it's still a good heuristic? Personally, I consider false positives a bigger issue than false negatives, and I think that matches typos' overall approach.
There is also all the this2that and thing4stuff
@jplatte I'd probably refine your comment to be "any identifier that exclusively word splits due to numbers and not any other separator (be it case or _
)
The next question is the likelihood of a shortened sha having no numbers. I probably didn't bring this up in the other thread talking about heuristics but I suspect to have something always complain than it have it complain in a way people no longer expect.
In the case of a hex string, that would be (10/16)
(since 6 out of the 16 possible chars are alphabetical) to the power of the string length. git short hashes are the shortest hash I see in practice, and they seem to start at 7 characters (longer in large repos), which puts the probability of such a hash having no digits at pretty much exactly 0.1%. Strings that contain no digits before any non-digit characters would be closer to 0.5% though (rough estimation, could also be >0.5%, but not <0.28%).
FYI #695 provides a new workaround for false positives
--> ./content/n/rust-docker.md:52:44
|
52 | hello 0.1.0 ac4e1a72ba05 2 minutes ago 1.38GB
| ^^
|
error: `ba` should be `by`, `be`
--> ./content/n/rust-docker.md:53:46
|
53 | rust 1.52.1-slim-buster 61cb3c65a6ba 3 weeks ago 621MB
| ^^
The extend-ignore-re
solved this issue.
[default]
extend-ignore-re = ["[0-9a-fA-F]{12}"]
I think we can safely close this issue.
@azzamsa that regex is much too generic, it disables spell-checking for all 12-letter identifiers as well.
Description
The
typos
pre-commit hook fails on truncated commit hashes inCHANGELOG.md
.Environment
- repo: https://github.com/crate-ci/typos rev: v1.20.4 hooks: - id: typos
Actual Behavior
$ pre-commit run --files CHANGELOG.md typos....................................................................Failed - hook id: typos - exit code: 2 error: `ba` should be `by`, `be` --> CHANGELOG.md:100:28 | 100 | - _(README)_ update - ([e84ba3e](https://github.com/DeadNews/firebirdsql-run/commit/e84ba3e8e2f72a8dcad43f8ac3c768527ca199bd)) | ^^ |
$ pre-commit run --files CHANGELOG.md typos....................................................................Failed - hook id: typos - exit code: 2 error: `ba` should be `by`, `be` --> CHANGELOG.md:22:99 | 22 | - update `mkdocs` config ([#127](https://github.com/DeadNews/encode-utils-cli/issues/127)) - ([c92ba20](https://github.com/DeadNews/encode-utils-cli/commit/c92ba2032ac0b492b390d45c50f7c57c2660df5c)) | ^^ | error: `ba` should be `by`, `be` --> CHANGELOG.md:62:43 | 62 | - _(renovate)_ use shared config - ([693c3ba](https://github.com/DeadNews/encode-utils-cli/commit/693c3ba58822db45dd06a032ba1ce554db6deaf6)) | ^^ |
original: https://github.com/crate-ci/typos/issues/982
ba
should beby
,be
↑ This ba
is in all examples.
Maybe add it to the exceptions?