text Better conversion ergonomics

Better conversion ergonomics

Open Kleidukos opened this issue 3 years ago • 37 comments

I'm opening this issue due to @Bodigrim's advice:

The people at Kowainik (cc @vrom911, @chshersh) have produced very good ergonomics when it comes to string types conversion. Functions like toText, toLText, toString, etc. Even risky conversion is taken care of, with errors (https://hackage.haskell.org/package/relude-0.7.0.0/docs/Relude-String-Conversion.html#t:ToText).

I think that this mechanism has its place in text, since it depends on base (for String), and bytestring.

Mar 14 '21 17:03 Kleidukos

💯 from me. Historically, these types of functions were not considered for the codebase due to the author disagreeing with them, resulting in everyone implementing them on an ad-hoc basis. But, I happen to know that now, the author has absolved himself of responsibility, and we can push ahead with doing it. I haven't put too much thought into the string conversion ergonomics beyond the most convenient function we all implement:

 tshow :: Show a => a -> Text
 tshow = pack . show

I'm all ears on making this a feature.

Mar 14 '21 17:03 emilypi

This is very true, we have seen an abundance of ad hoc implementations of ToText.

Mar 14 '21 17:03 Kleidukos

While I agree that something of this flavour is useful, I'm not convinced that it should produce a Text. Afterall, almost all uses of tshow in my experience are as part of a concatenation of other textual values. Moreover, an efficiently-implemented tshow would almost certainly be constructing the its result as a concatenation of individual parts.

Mar 14 '21 18:03 bgamari

Is this issue to discuss a potential design, witnessed by relude, or to bikeshed a (mostly-)new one?

@bgamari, relude provides show :: (Show a, IsString b) => a -> b, which happily produces a Builder if you wish one.

Mar 14 '21 18:03 Bodigrim

@bgamari interesting point, and that's true. Grepping through my codebases, the most-used utility is sshow :: (Show a, IsString b) => a -> b, and always in the context of concatenation.

Mar 14 '21 18:03 emilypi

@Bodigrim Having used the relude conversion functions in production systems, I am very satisfied of its ergonomics and features. My ticket was to implement it as-such in the Text library.

Mar 14 '21 18:03 Kleidukos

@Bodigrim I'll admit that I'm not a fan of such doubly-polymorphic conversion functions as they suffer from very poor inference. Of course, this is just a matter of style; I am free not to use the function if text provided it. However, I do worry about the ambiguous-type errors that beginners will encounter. In my experience, such errors have been a common sticking-point for beginners working in code written with OverloadedStrings.

Mar 14 '21 18:03 bgamari

@bgamari this is no worse than fromIntegral. I would argue that show :: (Show a, IsString b) => a -> b is not that ambiguous, even in presence of OverloadedStrings, unless its client (logging, console, etc.) is polymorphic, which usually is not the case.

Mar 14 '21 18:03 Bodigrim

@Kleidukos consider the case of Data.Aeson.object [ "key" := "value" ] with OverloadedStrings enabled.

Mar 14 '21 18:03 bgamari

@bgamari Okay thanks :)

Mar 14 '21 18:03 Kleidukos

I like the design of string conversions in relude. I'm not quite sure about LazyStrict, which looks a bit ad-hoc, but ConvertUtf8, ToText and ToLText seem to be natural additions to text.

I would like to hear from @chshersh and @vrom911 how do they feel about integrating their design into text.

Mar 14 '21 19:03 Bodigrim

@Bodigrim LazyStrict looks actually quite cool, with the fundeps. I like this design a lot.

Mar 14 '21 19:03 Kleidukos

As someone who (unwillingly) has to use an alternative prelude with exactly such a magic conversion function, I am strongly not in favour of it. It often turns type resolution to custard, and hides disgusting amounts of allocation and copying. Driving this via some kind of lawless type class is also not my idea of a good design.

Once again, 100% agree that the ergonomics of conversion are poor. However, I don't think this is a good solution.

Mar 15 '21 00:03 kozross

re LazyStrict there is https://hackage.haskell.org/package/strict-0.4.0.1/docs/Data-Strict-Classes.html which e.g. lens build upon since lens-5. (I.e. it already exists, text doesn't need to invent it)

Mar 15 '21 03:03 phadej

There is also https://hackage.haskell.org/package/string-conversions-0.4.0.1/docs/Data-String-Conversions.html if you want two way conversion. Thing like show :: (Show a, IsString b) => a -> b, why would it be in text, what make text the place for it, and not, I don't know, prettyprinter with its Pretty class!

Mar 15 '21 03:03 phadej

I agree with @phadej on both counts. If we do have nice conversions, I think I would prefer them to be text-valued, as opposed to generally convertible.

Mar 15 '21 05:03 emilypi

As someone who (unwillingly) has to use an alternative prelude with exactly such a magic conversion function

@kozross Could you tell us a bit more about the prelude you're using and the conversion mechanism you're talking about?

Mar 15 '21 07:03 Kleidukos

@phadej Yes I agree with you, and this was not part of my original proposal. :)

Mar 15 '21 07:03 Kleidukos

@Kleidukos My gripe was with the extremely polymorphic show (which as you clarified, isn't part of your original proposal). The one I have to deal with is from universum, which is even worse.

I am strongly against basing conversions (or indeed, anything) on lawless type classes. That is what relude appears to do based on what you linked. This is definitely not an idea I support.

Mar 15 '21 08:03 kozross

Okay, right now the main blocker to adoption seems to be that the typeclasses have no laws. Can't we make them?

Mar 15 '21 08:03 Kleidukos

Not really - the whole point I'm making is that type classes are a bad vehicle for this kind of idea. The way the Relude stuff is written is basically impossible to give any laws to, as it amounts to 'I can turn this into a Text... somehow'.

I think a much better example of what we should go for istshow. Everyone keeps writing it in their codebases, and providing it would be useful, minimal, and involve no lawless anything.

Mar 15 '21 08:03 kozross

I'm not entirely sure what input is required from us. We use the ToText typeclasses with other conversion functions from relude in various projects, and we find them highly convenient. We don't experience problems with interaction between OverloadedStrings and relude conversion functions. Moreover, this extension is enabled in our all projects by default in the .cabal file.

I also don't see any problems with typeclasses without laws. We already have Show, Read (which even fails in runtime with error), FromJSON/ToJSON, FromField/ToField from postgresql-simple, Arbitrary from QuickCheck, and many more. Even if they don't have laws, typeclasses are convenient, and they help a lot of people be productive and solve their problems efficiently, even if they don't have laws.

I don't think that having or not having laws should be the blocker to introduce typeclasses. There're different trade-offs for using typeclasses vs using explicit functions. But since it's possible to have only a single instance per type, global instance coherence plays a more crucial role. When introducing a typeclass, you can ask yourself a question, "How many instances of this typeclass could be for a type?". And for me, there's only one way to convert String to Text using the ToText typeclass, so typeclasses look suitable enough.

The way the Relude stuff is written is basically impossible to give any laws to, as it amounts to 'I can turn this into a Text... somehow'.

I wouldn't be so definitive in wording by saying that it's impossible. I see at least one law for the ToText typeclass:

Idempotence: toText . toText ≡ toText

And this law holds for all current instances. But even without this law, I think that the ToText typeclass if reasonable.

On the technical side, I have only one concern. If the ConvertUtf8 typeclass won't make it to text, but something like ToText would, then the wording of the custom error message for the ToText instances can prevent us from reexporting ToText from relude instead of using our own typeclass. Our custom error messages mention decodeUtf8 from ConvertUtf8 but if text would refer to decodeUtf8 from text, then we can't just reexport typeclasses, at least because the implementation of decodeUtf8 is different in text and relude.

Mar 15 '21 11:03 chshersh

According to Bodigrim, ConvertUtf8 should make it into text, so I'm not excessively worried about that.

Mar 15 '21 12:03 Kleidukos

And yes, we have many lawless typeclasses in Haskell. Now it may be because typeclasses are one polymorphic abstraction that is open to be extended, but it doesn't mean that it is untestable. One law that is not algebraic but that we can promote is that the ByteString/LByteString/ShortByteString conversions are a no-no, and that, quite fortunately, is taken care of by the EncodingError system.

Mar 15 '21 12:03 Kleidukos

The way the Relude stuff is written is basically impossible to give any laws to, as it amounts to 'I can turn this into a Text... somehow'.

I wouldn't be so definitive in wording by saying that it's impossible. I see at least one law for the ToText typeclass:

Idempotence: toText . toText ≡ toText

And this law holds for all current instances. But even without this law, I think that the ToText typeclass if reasonable.

While I also don't think that typeclasses have to have laws, and I really like toText from relude, this particular law will always be satisfied, as toText . toText ≡ toText @Text . toText ≡ id . toText ≡ toText due to toText @Text ≡ id.

Mar 15 '21 12:03 amesgen

I also find those three classes reasonable enough (ToText, ToLText, ConvertUtf8).

I think there can be other ways than laws to easily grasp what is going on with a type class. In this case, there are many morally equivalent representations of "text" (String, Text, LText, also Builder and ShortText), and the functions ToText/ToLText reify that equivalence for situations where you just want to normalize everything down to the same type. What matters is that these types are all "Unicode strings", so toText does essentially nothing on some semantic level, and that's why it's reasonable. (One possible blind spot is IF any such conversion also requires some kind of encoding normalization, I hope that's not the case.)

Similarly, ConvertUtf8 is intended to be the way to convert in UTF-8 between any type of "bytestring" and any type of "text". Again, the idea is that all instances implement one well-defined semantics (that of UTF-8) in some intuitive sense.

I think what would be problematic is if UTF-8 conversion may or may not be performed depending on the inferred types, but that's not what @Kleidukos proposed here.

Another low-tech approach is to expose all those conversions as monomorphic functions, but decodeUtf8FromByteStringToText doesn't necessarily make things easier to read.

Mar 15 '21 14:03 Lysxia

The argument that 'we have lawless type classes already, what's a few more' is pretty bad, especially when the existing examples given are some of the most problematic type classes in the ecosystem (not only for their lawlessness). If you see litter, the solution is at least not to add more litter, not drive by with a dump truck. This is particularly pertinent here, given that text is a boot library.

I am in favour of simple, monomorphic conversion functions. Dragging in type classes to solve this problem is the wrong approach, especially considering that UTF-8 converter type class will absolutely shred type inference without littering your code with explicit types or @-nnotations. Yes, it means people have to write slightly longer names - so what? If that means clarity and type inference, I'm in favour. Making things short and automagic by way of lawless type classes (especially ones with multiple parameters and no fundeps) is a false economy - you're trading (arguably justified) inconvenience now for massively worse inconvenience later.

I'm not speaking theoretically either - I'm knee-deep in several codebases at my Real Job For Real Money which decided on this false economy. The number of hours of my life I've lost on utterly unproductive tasks as a result of this is so high that I cannot agree with such a design even in theory.

Mar 15 '21 21:03 kozross

will absolutely shred type inference without littering your code with explicit types or @-nnotations. Yes, it means people have to write slightly longer names - so what?

Don't these expressions convey exactly the same information?

(decodeUtf8 :: ByteString -> Text)
decodeUtf8FromByteStringToText

You're claiming that the latter variant is better; what's the problem with the former? Is a "slightly longer name" more than an ad hoc form of type annotation?

My own argument above was explicitly not "look at those other lawless type classes", but "these classes we are discussing here may make sense even without equational laws", so I'd like to know where that specific argument goes wrong.

To be clear, I am still ambivalent about whether those type classes are a good idea. I am willing to respect your informed opinion on this, but your argument as written is "I've been bitten by lawless classes before therefore lawless classes are bad" which is a logical fallacy. Elaborating that point would make your case more convincing.

Mar 15 '21 23:03 Lysxia

@Lysxia I think I was a bit unclear in my argument - sorry! Let me clarify.

The issue I have with the UTF-8 conversion type class proposed here isn't (entirely) to do with its lawlessness. Likewise, the thing I was bitten by wasn't (entirely) lawlessness. The far bigger problem for me has to do with the fact it's a multi-parameter type class without functional dependencies. This is a known bad idea in the community - even the author of MPTCs said as much! - and it's not nearly as simple to resolve as you suggest in your response. I've had cases where even completely known monomorphic uses of a method from such a type class still required me to @-nnotate to say what instance I needed!

This is not to say I support lawless type classes (I don't). It's clarifying that my statements above were primarily with regard to MPTCs without fundeps. My work-related anecdotes regarding lawless type classes are still bad; the ones regarding MPTCs without fundeps are orders of magnitude worse. They're hard to diagnose, the output of the compiler is confusing and misleading, and they are generally terrible on every level. Even if you accept lawless type classes as a valid solution to anything, you should reject MPTCs without fundeps - they're not worth the cost later, no matter how convenient they appear.

Hopefully this clarifies my position.

Mar 16 '21 00:03 kozross

Thanks for clarifying. So it's more about MPTC than lawlessness, but I'm still not sure how to understand this issue you mention

I've had cases where even completely known monomorphic uses of a method from such a type class still required me to @-nnotate to say what instance I needed!

decodeUtf8 :: ByteString -> Text  -- (1)
decodeUtf8 @ByteString @Text      -- (2)
decodeUtf8FromByteStringToText    -- (3)

AFAICT (1) and (2) are completely equivalent modulo getting the order of arguments right in (2), and there is no situation where you'd need (2) where you couldn't also have used (1). Do the problems you mention involve PolyKinds or ambiguous kinds? They are the only relevant circumstances I can think of, and they do not apply here.

And why are type annotations ((1) or (2) or both) bad? You've said that but not explained it. To me, none of the options above look significantly better or worse than the others.

The only difference I see is that in (1) and (2) you can sometimes omit the type annotation, which can lead to problems when a typo leads to wrong code compiling with the wrong instance. However, I've argued above that this should not be a problem since all instances "morally" do the same thing. The kind of mistakes that class may allow is on the same level as mistyping ...FromLazyByteString... instead of ...FromByteString..., which can happen just as well.

Abusing MPTC or lawless classes may lead to disaster. I am arguing that this proposal has good properties that make it not misuse and it will not lead to disaster.

Mar 16 '21 01:03 Lysxia

text text copied to clipboard

Better conversion ergonomics

text
text copied to clipboard