typos DELET should be DELETE in "WHEN a user DELETEs /[...]"

Hi, I have comments on tests that look like:

GIVEN the sky is blue
WHEN a user DELETEs /something
THEN something happens

typos seems to accept POSTs, HEADs, GETs, but not DELETEs (nor OPTIONs, but doesn't make much sense so it's OK):

error: `DELET` should be `DELETE`
  --> ./[...]
    |
405 |     WHEN a user DELETEs /[...]
    |                 ^^^^^
    |
error: `OPTIO` should be `OPTION`
  --> ./[...]
    |
406 |     WHEN a user OPTIONs /[...]
    |                 ^^^^^
    |

When I run typos -w, I end up with: DELETEEs and OPTIONNs.

Maybe typo thinks I'm writing a new camelCase word instead of pluralizing ?

Jun 13 '23 09:06 awoimbee

typos seems to accept POSTs, HEADs, GETs, but not DELETEs (nor OPTIONs, but doesn't make much sense so it's OK):

...

Maybe typo thinks I'm writing a new camelCase word instead of pluralizing ?

Yes it is thinking you are using camelCase.

If you run typo --words, you'll see that these are POS Ts, HEA Ds, GE Ts, DELET Es, and OPTIO Ns. The only reason they are "accepted" is POS, HEA, and GE` aren't in our list of typos to correct.

Jun 13 '23 13:06 epage

Thanks ! Then the question becomes:

Do we want to add a rule to accept any uppercase words with a lowercase 's' ?
Or do we add special rules for some words (GET, POST, ...) ?
Or do we prefer to keep these as exceptions that users can configure via extend-ignore-identifiers-re ?

Jun 13 '23 14:06 awoimbee

Or do we prefer to keep these as exceptions that users can configure via extend-ignore-identifiers-re ?

You can also ignore them by adding them to extend-identifiers and having them correct to themselves.

I'll leave this open for a bit to see how much general interest there is for this but I would lean towards extend-identifiers for now

Jun 14 '23 13:06 epage

I meet with this on a daily basis. DELETE-s is my suggestion.

Jan 09 '24 21:01 szepeviktor

I you have the same issue please add a thumbs up to let the maintainer know how many people are affected. In the meantime you should create a .typos.toml file with:

[default]
extend-ignore-identifiers-re = [
    "DELETEs" # https://github.com/crate-ci/typos/issues/745
]

Jan 10 '24 10:01 awoimbee

Reposting from duplicate https://github.com/crate-ci/typos/issues/916:

1  D COULDa
2  D COULDd
3  D COULDs
4  D SHOULDs
5  I LOCIs
6  I OCTOPId
7  L AT_WILLs
8  L WILLs
9  T BROUGHTd
10 T BROUGHTs
11 T MUSTs
12 T WITs
13 T SHOULDNTs

The README.md above produces the output below:

README.md:1:6: `COUL` -> `COULD`
README.md:2:6: `COUL` -> `COULD`
README.md:3:6: `COUL` -> `COULD`
README.md:4:6: `SHOUL` -> `SHOULD`, `SHAWL`, `SHOAL`
README.md:7:9: `WIL` -> `WILL`, `WELL`
README.md:8:6: `WIL` -> `WILL`, `WELL`

Cases 5,6 and 9-12 seem to be handled correctly.

It seems that a lowercase letter that follows an all uppercase word not ending in T or I is split incorrectly.

SHOULDNTs behaves as expected and SHOULDs does not, I wonder if that's a clue 🤔 .

EDIT: So far the regex below seems to swallow/mute the discrepancy:

extend-ignore-re = ["[A-Z]+[^TFI][a-z]"]

Jan 31 '24 21:01 mkatychev

It seems that a lowercase letter that follows an all uppercase word not ending in T or I is split incorrectly.

SHOULDNTs behaves as expected and SHOULDs does not, I wonder if that's a clue 🤔 .

Typos word splitter is strict. If it sees a change in case, it will treat it as a word change. You can see this in action by running with --words. Using the word list you gave, I get

D
COUL
Da
D
COUL
Dd
D
COUL
Ds
D
SHOUL
Ds
I
LOC
Is
I
OCTOP
Id
L
AT
WIL
Ls
L
WIL
Ls
T
BROUGH
Td
T
BROUGH
Ts
T
MUS
Ts
T
WI
Ts
T
SHOULDN
Ts

Whether it reports a typo because of that is dependent on what is in our list of typos. We don't have an approved list of words that have to be matched but a deny list of misspellings and how they can be corrected.

Jan 31 '24 21:01 epage

That's handy insight, I think the issue with the output above is that a boundary between words is being treated as a word itself. Any match on this regex [A-Z]+[a-z] is going to produce a "false" PascalCase word using the last upper and the following lower, does that feel overly optimistic?

Jan 31 '24 21:01 mkatychev

As a human, its easy to look at these cases and recognize where the word split is. When encoding this, its much more difficult and either direction you go, you'll get things wrong.

Jan 31 '24 21:01 epage

Fully agree, by no means am I suggesting a solution, just engaging in ascription. Describing the problem is helpful (for myself and possibly others) in understanding it.

Two character words seem to be a particularly tricky thing since it touches on other issues (like hexadecminal representation).

Jan 31 '24 22:01 mkatychev

I’m relieved I’m not the first person to have this issue but I’m surprised nobody mentioned SQL so far. Many people write SQL keywords all uppercase and I use them as verbs, too. typos doesn’t understand this:

$ echo "A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table." | ./typos -
error: `INSER` should be `INSERT`
  --> -:1:46
  |
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
  |                                              ^^^^^
  |
error: `UPDAT` should be `UPDATE`
  --> -:1:60
  |
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
  |                                                            ^^^^^
  |
error: `DELET` should be `DELETE`
  --> -:1:69
  |
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
  |                                                                     ^^^^^
  |
$ ./typos --version
typos-cli 1.18.1

The tool also wanted to correct UNIQUEs but you may argue that this is a weird slang for UNIQUE constraints. To allow all major SQL verbs which are not allowed by default:

[default]
extend-ignore-identifiers-re = [
    "\\bDELETEs\\b",
    "\\bINSERTs\\b",
    "\\bUPDATEs\\b",
]

Update: added word boundary markers. Thanks @szepeviktor!

Feb 07 '24 15:02 dboehmer

Adding word boundaries may help.

    "\\bDELETEs\\b",

Feb 07 '24 16:02 szepeviktor

Ah I finally found this before trying to make a new issue. I have this as an example:

error: `CANDIDAT` should be `CANDIDATE`
  --> src/somefile.ts:278:71
    |
278 | * Refuse to discuss anything other than the POSITION and the CANDIDATEs skills and experience.
    |                                                              ^^^^^^^^
    |

Is there a way to generically handle the case 'ALLCAPSs' and treat and check it with the valid-word-list? In other words; mark out (with a regex) what will be checked? I was looking at default.extend-ignore-words-re etc, but it seems what I'd want is something that transformed the word, then did the typos check on that one?

I could of course just start having a big manual list, but I know the pattern of these ALL_CAPS_IDENTIFIER + 's'.

Jun 07 '24 09:06 odinho

There is not a way to normalize custom casing and then check that.

If this is something you do frequently and there isn't other punctuation, like backticks in markdown, that you want to use, it would likely be best to use default.extend-ignore-words-re and ignore them and risk a typo than have it complain when there is nothing.

Jun 07 '24 15:06 epage