DELET should be DELETE in "WHEN a user DELETEs /[...]"
Hi, I have comments on tests that look like:
GIVEN the sky is blue
WHEN a user DELETEs /something
THEN something happens
typos seems to accept POSTs, HEADs, GETs, but not DELETEs (nor OPTIONs, but doesn't make much sense so it's OK):
error: `DELET` should be `DELETE`
--> ./[...]
|
405 | WHEN a user DELETEs /[...]
| ^^^^^
|
error: `OPTIO` should be `OPTION`
--> ./[...]
|
406 | WHEN a user OPTIONs /[...]
| ^^^^^
|
When I run typos -w, I end up with: DELETEEs and OPTIONNs.
Maybe typo thinks I'm writing a new camelCase word instead of pluralizing ?
typos seems to accept POSTs, HEADs, GETs, but not DELETEs (nor OPTIONs, but doesn't make much sense so it's OK):
...
Maybe typo thinks I'm writing a new camelCase word instead of pluralizing ?
Yes it is thinking you are using camelCase.
If you run typo --words, you'll see that these are POS Ts, HEA Ds, GE Ts, DELET Es, and OPTIO Ns. The only reason they are "accepted" is POS, HEA, and GE` aren't in our list of typos to correct.
Thanks ! Then the question becomes:
- Do we want to add a rule to accept any uppercase words with a lowercase 's' ?
- Or do we add special rules for some words (GET, POST, ...) ?
- Or do we prefer to keep these as exceptions that users can configure via
extend-ignore-identifiers-re?
Or do we prefer to keep these as exceptions that users can configure via extend-ignore-identifiers-re ?
You can also ignore them by adding them to extend-identifiers and having them correct to themselves.
I'll leave this open for a bit to see how much general interest there is for this but I would lean towards extend-identifiers for now
I meet with this on a daily basis.
DELETE-s is my suggestion.
I you have the same issue please add a thumbs up to let the maintainer know how many people are affected.
In the meantime you should create a .typos.toml file with:
[default]
extend-ignore-identifiers-re = [
"DELETEs" # https://github.com/crate-ci/typos/issues/745
]
Reposting from duplicate https://github.com/crate-ci/typos/issues/916:
1 D COULDa
2 D COULDd
3 D COULDs
4 D SHOULDs
5 I LOCIs
6 I OCTOPId
7 L AT_WILLs
8 L WILLs
9 T BROUGHTd
10 T BROUGHTs
11 T MUSTs
12 T WITs
13 T SHOULDNTs
The README.md above produces the output below:
README.md:1:6: `COUL` -> `COULD`
README.md:2:6: `COUL` -> `COULD`
README.md:3:6: `COUL` -> `COULD`
README.md:4:6: `SHOUL` -> `SHOULD`, `SHAWL`, `SHOAL`
README.md:7:9: `WIL` -> `WILL`, `WELL`
README.md:8:6: `WIL` -> `WILL`, `WELL`
Cases 5,6 and 9-12 seem to be handled correctly.
It seems that a lowercase letter that follows an all uppercase word not ending in T or I is split incorrectly.
SHOULDNTs behaves as expected and SHOULDs does not, I wonder if that's a clue 🤔 .
EDIT: So far the regex below seems to swallow/mute the discrepancy:
extend-ignore-re = ["[A-Z]+[^TFI][a-z]"]
It seems that a lowercase letter that follows an all uppercase word not ending in
TorIis split incorrectly.
SHOULDNTsbehaves as expected andSHOULDsdoes not, I wonder if that's a clue 🤔 .
Typos word splitter is strict. If it sees a change in case, it will treat it as a word change. You can see this in action by running with --words. Using the word list you gave, I get
D
COUL
Da
D
COUL
Dd
D
COUL
Ds
D
SHOUL
Ds
I
LOC
Is
I
OCTOP
Id
L
AT
WIL
Ls
L
WIL
Ls
T
BROUGH
Td
T
BROUGH
Ts
T
MUS
Ts
T
WI
Ts
T
SHOULDN
Ts
Whether it reports a typo because of that is dependent on what is in our list of typos. We don't have an approved list of words that have to be matched but a deny list of misspellings and how they can be corrected.
That's handy insight, I think the issue with the output above is that a boundary between words is being treated as a word itself.
Any match on this regex [A-Z]+[a-z] is going to produce a "false" PascalCase word using the last upper and the following lower, does that feel overly optimistic?
As a human, its easy to look at these cases and recognize where the word split is. When encoding this, its much more difficult and either direction you go, you'll get things wrong.
Fully agree, by no means am I suggesting a solution, just engaging in ascription. Describing the problem is helpful (for myself and possibly others) in understanding it.
Two character words seem to be a particularly tricky thing since it touches on other issues (like hexadecminal representation).
I’m relieved I’m not the first person to have this issue but I’m surprised nobody mentioned SQL so far. Many people write SQL keywords all uppercase and I use them as verbs, too. typos doesn’t understand this:
$ echo "A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table." | ./typos -
error: `INSER` should be `INSERT`
--> -:1:46
|
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
| ^^^^^
|
error: `UPDAT` should be `UPDATE`
--> -:1:60
|
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
| ^^^^^
|
error: `DELET` should be `DELETE`
--> -:1:69
|
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
| ^^^^^
|
$ ./typos --version
typos-cli 1.18.1
The tool also wanted to correct UNIQUEs but you may argue that this is a weird slang for UNIQUE constraints. To allow all major SQL verbs which are not allowed by default:
[default]
extend-ignore-identifiers-re = [
"\\bDELETEs\\b",
"\\bINSERTs\\b",
"\\bUPDATEs\\b",
]
Update: added word boundary markers. Thanks @szepeviktor!
Adding word boundaries may help.
"\\bDELETEs\\b",
Ah I finally found this before trying to make a new issue. I have this as an example:
error: `CANDIDAT` should be `CANDIDATE`
--> src/somefile.ts:278:71
|
278 | * Refuse to discuss anything other than the POSITION and the CANDIDATEs skills and experience.
| ^^^^^^^^
|
Is there a way to generically handle the case 'ALLCAPSs' and treat and check it with the valid-word-list? In other words; mark out (with a regex) what will be checked? I was looking at default.extend-ignore-words-re etc, but it seems what I'd want is something that transformed the word, then did the typos check on that one?
I could of course just start having a big manual list, but I know the pattern of these ALL_CAPS_IDENTIFIER + 's'.
There is not a way to normalize custom casing and then check that.
If this is something you do frequently and there isn't other punctuation, like backticks in markdown, that you want to use, it would likely be best to use default.extend-ignore-words-re and ignore them and risk a typo than have it complain when there is nothing.