fuzzywuzzy-rs icon indicating copy to clipboard operation
fuzzywuzzy-rs copied to clipboard

Add an explicit type definition for matches returned by process methods

Open seanpianka opened this issue 3 years ago • 9 comments

Add an explicit type definition Match for matches returned by process methods (containing matched text and a score). This is helpful when adding behavior involving sorting or comparison, and provides more semantic meaning by using m.text() or m.score() instead of m.0 or m.1 respectively.

Includes a few convenience impls for creating a Match from and comparing a Match to (&str, u8).

Depends on: #14

seanpianka avatar Sep 26 '20 06:09 seanpianka

Love it, I was gunna add something just like this. :)

I will probably review this more thoroughly Tuesday. This weekend is busy for me.

Two thoughts: 1) Match is probably not the best name. You can have a Match with a score of 0! You can have a Match that is nonzero but very low. You get the idea. Can we come up with a name better suited to the purpose?

It makes me wonder if we ought to have a scoring function that returns only scores and our functions matching the python API built off of that. (I.e., a function returning an iterable of scores which the current functions wrap to produce matches - in case someone is concerned about the allocations or something)

  1. I anticipate we'll be adding more wrapper types. Can you move the definition into a types.rs file or specific a match.rs file (or whatever finalized name) in atypes module so the type definitions and impls are separate?

logannc avatar Sep 26 '20 07:09 logannc

Thanks for the quick reply! This is my first open-source project that I've collaborated on, and I'm enjoying my experience so far 🙂 👍

Two thoughts: 1) Match is probably not the best name. You can have a Match with a score of 0! You can have a Match that is nonzero but very low. You get the idea. Can we come up with a name better suited to the purpose?

I don't see what is unsound with the idea of a match with a score of 0, care to elaborate?

Feel free to suggest an alternative name, however I would prefer to avoid introducing additional terminology or overlaying additional concepts onto the original package's model for string matching (without justification).

It makes me wonder if we ought to have a scoring function that returns only scores and our functions matching the python API built off of that. (I.e., a function returning an iterable of scores which the current functions wrap to produce matches - in case someone is concerned about the allocations or something)

Can you clarify what the problem is and provide an intended outcome? I find your description too abstract.

  1. I anticipate we'll be adding more wrapper types. Can you move the definition into a types.rs file or specific a match.rs file (or whatever finalized name) in atypes module so the type definitions and impls are separate?

I philosophically find it preferable to keep code that changes together located together, but perhaps I'm mis-using the definition of the single responsibility principle?

"Gather together the things that change for the same reasons. Separate those things that change for different reasons."

Match is only used by these convenience methods in the process module. I see potential benefit in storing definitions for types wrapping default values (those likely to be used in both fuzz and process), but what is the benefit of separating this type which is specifically tied to the module where it is used?

seanpianka avatar Sep 26 '20 07:09 seanpianka

The conflict I have in mind is that 'Match', or something like it, is a more appropriate name for a type used by util::find_longest_match.

The problem with programming principles is they can be used to argue anything. :) I could also argue that one such responsibility is 'defining the Match type', for example.

My reason - which is similar but distinct - is the definition of Match and it's impls are noise, clutter that makes reading through this module more difficult. The process module becomes more readable by eliminating this clutter. Conversely, the full definition of Match is more clear once separated.

This is admittedly getting into aesthetics, but I do think your principle supports this. They are linked but logically independent.

Anyway, I'll stop ranting.

logannc avatar Sep 26 '20 16:09 logannc

I can understand your concern with readability of the process module, so I'm willing to separate out the type definitions from the implementation. However, I still don't see what's particularly wrong, in this case, with re-using Match in both process methods and potential utils methods?

It's not ranting -- I asked for clarification. 🙂

seanpianka avatar Sep 26 '20 23:09 seanpianka

Admittedly, in utils, MatchingSequence is probably more appropriate. Still, for process, QueryScore or just Score is more accurate as we're not really doing any matching.

logannc avatar Sep 27 '20 00:09 logannc

Hi @logannc, just wanted to know if you had updates or any thoughts on the changes discussed above with returning MatchingSequence or QueryScore. This would change the public API, but since we're 0.y.z, we can bump y if we make any public crate API changes.

seanpianka avatar Nov 02 '20 05:11 seanpianka

I don't know how this interacts with the other PRs but it seems fine for now. We'll deal with merging later. :D

logannc avatar Jun 18 '21 19:06 logannc

@logannc Right, we'll have to see how to rebase the mega-change PR now. Thanks for the review.

Do you have an opinion on the name Score versus QueryScore? Or an alternative?

seanpianka avatar Jun 18 '21 20:06 seanpianka

They both have merit. I've never liked the process methods haha

logannc avatar Jun 18 '21 20:06 logannc