is-vegan icon indicating copy to clipboard operation
is-vegan copied to clipboard

Demo considers 'MECHANICALLY SEPARATED CHICKEN' as vegan

Open fluxsauce opened this issue 7 years ago • 8 comments

https://github.com/hmontazeri/is-vegan/blame/25e87fef5b88f92319f001c6af96ff658fddcbf2/README.md#L152

isVegan.containsNonVeganIngredients([
...
  'MECHANICALLY SEPARATED CHICKEN',
...
]); // returns ['PASTEURIZED MILK', 'PORK', 'BEEF', 'WHEY']

Consider using fuzzy matching with a degree of confidence instead of string matching.

fluxsauce avatar Feb 06 '18 16:02 fluxsauce

Good point! will be added. Thx!

hmontazeri avatar Feb 06 '18 16:02 hmontazeri

Fixed!

hmontazeri avatar Feb 06 '18 16:02 hmontazeri

I disagree, the fundamental problem is still there. Yes, mechanically separated chicken is technically on the list, but what about separated chicken? chicken parts?

At the very least, search the entire string, don't just match the exact string.

fluxsauce avatar Feb 06 '18 16:02 fluxsauce

I understand what you mean, a wildcard search is not the answer as well... It could match parts which could make it worse than matching an exact string...

hmontazeri avatar Feb 06 '18 16:02 hmontazeri

@fluxsauce how about adding a regex search for obvious meat / fish species?

hmontazeri avatar Feb 06 '18 16:02 hmontazeri

That would work; I'd call it a component match and fill it with terms that shouldn't false positive, such as:

pig, pork, lard, beef, ribs, fillet, poultry, chicken, turkey, eggs, sheep, mutton, lamb, goat, rabbit, caviar, roe, honey, venison, steak

it'd be a shorter list than whole ingredients.

fluxsauce avatar Feb 06 '18 17:02 fluxsauce

I'd be wary of searching for particular substrings like "chicken", because you can easily also end up needing to add prefixes and such to avoid false negatives on things like "vegan chicken" or "chicken alternative", or "chicken tofu", etc.

Seems like there's a different (better) solution out there, but I'm not sure what it is. The above solution(s) work somewhat if you prefer false negatives over false positives, though.

drusepth avatar Feb 06 '18 17:02 drusepth

The above solution(s) work somewhat if you prefer false negatives over false positives, though.

It could be mitigated with a blacklist of known false positives like "chicken alternative".

There's a reason why I personally avoid signature-based scanners, it's a constant "two steps forward and one step back" of exceptions.

Try valid US street address parsing as an example, seems simple until it isn't :-)

fluxsauce avatar Feb 06 '18 18:02 fluxsauce