fuzzywuzzy-rs
fuzzywuzzy-rs copied to clipboard
Add an explicit type definition for matches returned by process methods
Add an explicit type definition Match
for matches returned by process
methods (containing matched text and a score). This is helpful when adding behavior involving sorting or comparison, and provides more semantic meaning by using m.text()
or m.score()
instead of m.0
or m.1
respectively.
Includes a few convenience impl
s for creating a Match
from and comparing a Match
to (&str, u8)
.
Depends on: #14
Love it, I was gunna add something just like this. :)
I will probably review this more thoroughly Tuesday. This weekend is busy for me.
Two thoughts: 1) Match
is probably not the best name. You can have a Match with a score of 0! You can have a Match that is nonzero but very low. You get the idea. Can we come up with a name better suited to the purpose?
It makes me wonder if we ought to have a scoring function that returns only scores and our functions matching the python API built off of that. (I.e., a function returning an iterable of scores which the current functions wrap to produce matches - in case someone is concerned about the allocations or something)
- I anticipate we'll be adding more wrapper types. Can you move the definition into a
types.rs
file or specific amatch.rs
file (or whatever finalized name) in atypes
module so the type definitions and impls are separate?
Thanks for the quick reply! This is my first open-source project that I've collaborated on, and I'm enjoying my experience so far 🙂 👍
Two thoughts: 1)
Match
is probably not the best name. You can have a Match with a score of 0! You can have a Match that is nonzero but very low. You get the idea. Can we come up with a name better suited to the purpose?
I don't see what is unsound with the idea of a match with a score of 0, care to elaborate?
Feel free to suggest an alternative name, however I would prefer to avoid introducing additional terminology or overlaying additional concepts onto the original package's model for string matching (without justification).
It makes me wonder if we ought to have a scoring function that returns only scores and our functions matching the python API built off of that. (I.e., a function returning an iterable of scores which the current functions wrap to produce matches - in case someone is concerned about the allocations or something)
Can you clarify what the problem is and provide an intended outcome? I find your description too abstract.
- I anticipate we'll be adding more wrapper types. Can you move the definition into a
types.rs
file or specific amatch.rs
file (or whatever finalized name) in atypes
module so the type definitions and impls are separate?
I philosophically find it preferable to keep code that changes together located together, but perhaps I'm mis-using the definition of the single responsibility principle?
"Gather together the things that change for the same reasons. Separate those things that change for different reasons."
Match
is only used by these convenience methods in the process
module. I see potential benefit in storing definitions for types wrapping default values (those likely to be used in both fuzz
and process
), but what is the benefit of separating this type which is specifically tied to the module where it is used?
The conflict I have in mind is that 'Match', or something like it, is a more appropriate name for a type used by util::find_longest_match
.
The problem with programming principles is they can be used to argue anything. :) I could also argue that one such responsibility is 'defining the Match
type', for example.
My reason - which is similar but distinct - is the definition of Match
and it's impls are noise, clutter that makes reading through this module more difficult. The process module becomes more readable by eliminating this clutter. Conversely, the full definition of Match
is more clear once separated.
This is admittedly getting into aesthetics, but I do think your principle supports this. They are linked but logically independent.
Anyway, I'll stop ranting.
I can understand your concern with readability of the process module, so I'm willing to separate out the type definitions from the implementation. However, I still don't see what's particularly wrong, in this case, with re-using Match
in both process
methods and potential utils
methods?
It's not ranting -- I asked for clarification. 🙂
Admittedly, in utils
, MatchingSequence
is probably more appropriate. Still, for process
, QueryScore
or just Score
is more accurate as we're not really doing any matching.
Hi @logannc, just wanted to know if you had updates or any thoughts on the changes discussed above with returning MatchingSequence
or QueryScore
. This would change the public API, but since we're 0.y.z, we can bump y if we make any public crate API changes.
I don't know how this interacts with the other PRs but it seems fine for now. We'll deal with merging later. :D
@logannc Right, we'll have to see how to rebase the mega-change PR now. Thanks for the review.
Do you have an opinion on the name Score
versus QueryScore
? Or an alternative?
They both have merit. I've never liked the process methods haha