heck icon indicating copy to clipboard operation
heck copied to clipboard

Make "transform" public and more reusable.

Open progval opened this issue 6 years ago • 4 comments

Hi,

I was looking for a way to split words in a similar way to heck, in a crate that already uses heck.

The obvious way would be to use snake_case and split on underscores, but I thought it would be more useful to make transform reusable to directly produce a vector (and other collections), so here it is.

I understand it does not fit the intended use case of heck, so I will understand if you refuse this PR.

Thanks!

progval avatar Jul 05 '18 09:07 progval

Hey, thanks for this PR! I've actually been interested recently in revisiting this library in a number of ways and prepping a 1.0 release, and this was one of the things I definitely considered (#6 is a prior request for the same feature that I missed and never responded to).

Its also interesting that you made transform generic. I also considered this, and the problem is that you still take the input data as &str. I'm not sure how useful being generic over the output actually is if you aren't also generic over the input. I also prefer to keep the ToOwned relationship between input and output, making the signature more like:

fn transform<T, F, G>(...) -> T::Owned where
    T: UnicodeSegmentation + ToOwned,
    T::Owned: Default,
    ...

Unfortunately, it seems really unlikely that any type other than str and String could ever meet these bounds.

So for now, I don't think I want to do make this function more generic, unless there's a really clear concrete use case for, as you say, producing a vector instead of a String.

But I'm happy with the idea of exposing this function!

withoutboats avatar Jul 05 '18 21:07 withoutboats

I also prefer to keep the ToOwned relationship between input and output

Why?

unless there's a really clear concrete use case for, as you say, producing a vector instead of a String

I'm writing a code generator that takes an XML Schema as input. The problem is I have to output a lot more structs and enums than there are names in the schema, so I'm trying various heuristics to give them nice developer-friendly names. The one I was thinking of here is, given an enum whose variants have types FooBarSimpleType, BazSimpleType, and QuxQuuxSimpleType, a longest-suffix/longest-substring search would guess the name SimpleType for the enum. And word splitting would avoid accidents like rSimpleTime if all prefixes end with an r.

(But I'm not sure it would actually use it in my code, I have yet to make a dataset of existing XML schemas to find patterns in the names they use for their elements.)

progval avatar Jul 05 '18 21:07 progval

Interesting. So it seems like you actually want to visit all of the "words" as we've defined them for this purpose. Probably a more correct API for your use case would be some sort of iterator over substrings, rather than boxing all users into the transform API which is specifically designed for reformatting those substrings.

However, I'd also warn you that since you're doing string comparisons, you may need to worry about unicode normalization once your data is not ASCII only - for example, if the common suffix were Naïve, there are multiple ways that could be encoded in unicode which should be considered equivalent. This might get very hairy.

withoutboats avatar Jul 06 '18 05:07 withoutboats

Why?

This is a good question, and I'm not sure - its really just a hunch to be honest.

I guess because this library has been focused on transformations for various user-facing display purposes. For display purposes, you'd never want to go from a str to a vector, instead you might possibly want to go from your custom str type to your custom String type.

I'm worried about taking on responsibilities to suit purposes outside of this display-oriented scope, because then it might grow more and more features and be more complicated.

I want to expose the helpers in lib.rs so that users can write their own ToShoutyKebabCase or whatever they need, but I worry about the implications of directly and explicitly supporting use cases like yours (like that we may need unicode normalization).

I feel more comfortable with the iterator API I mentioned in my previous comment, though implementing it may be unpleasant, because that leaves the collecting aspect of it to you, and not in this library.

withoutboats avatar Jul 06 '18 05:07 withoutboats

This PR is very old and has conflicts, so I'll close it. If you are still interested in this, please let's discuss how to expose it (which seems like it was the contentious bit of this PR) in #6 first.

jplatte avatar Aug 15 '23 08:08 jplatte