universum
universum copied to clipboard
Make readFile overloaded
To read Text (lazy and strict), ByteString (lazy and strict), or even String.
Would be nice to overload all functions from
https://hackage.haskell.org/package/universum-0.7.1/docs/Lifted-File.html
because at present some of them accept strict Text and others accept lazy Text, and it seems confusing
@int-index @Martoon-00 Overloading, especially with IsString has bigger impact to resolving ambiguity. After that change it's no longer possible to write some simple code without -XTypeApplications. Backpack seems like better solution for this problem.
Another solution is to use different suffixes like getContentsT or getContentsLT or getContentsB.
Backpack seems like better solution for this problem.
How would it help, exactly?
@int-index Okay, after thinking a little bit more I understood that it can't help. You want to be able to use both versions of readFile: to Text and to ByteString while with Backpack you can use only one inside target of your project.
while with Backpack you can use only one inside target of your project.
With Backpack you could use all versions, but imported from different instantiations. Basically, you'd get the same situation as today, with importing a separate version for each type.
Well, we can overload every function like it's done in Print module: type class per one function.
Or we can overload all functions from Lifted.File module with one type class. Dunno about name for this type class though... And I don't know whether it's a good idea or not, will it have impact on performance or not, will it introduce more problems in type checking (because after this change you might end up using a lot of @Text or @ByteString in your code, and you probably don't want to do this because Text should be enough in most cases; only some parsing libraries (like aeson) need ByteString as input).
Polymorphic writeFille and readFile is a nice thing to have, though it's not critical at all. Sometimes it's actually more convenient to know the particular argument type -- e.g. BSL.writeFile instead of generalized writeFile.
The more I write code the more I want polymorphic readFile and writeFile functions... Usually I need to read ByteString when I further process it with some encode function: aeson works with lazy bytestring, yaml works with strict bytestring. You can't remember this. And sometimes this results in minor inconvenience and distraction from idea.
On the other hand, you better think again and use BSL everywhere to avoid extra memory overhead, right?
@volhovm Well, with yaml package I have to use strict ByteString. Though, I like the suggestion to not use yaml!
https://www.snoyman.com/blog/2016/12/beware-of-readfile
So I vote for making it return m ByteString. Maybe overload it to return strict or lazy ByteString, but not Text or String.
https://www.snoyman.com/blog/2016/12/beware-of-readfile
IIUC, the encoding issues described here can be solved by employing the same strategy Kirill Elagin used in the with-utf8 package (essentially, use hSetEncoding <handle> utf8 under the hood).
@gromakovsky Would you be for supporting strict/lazy Text and String using this approach?
I think there are a few open questions:
- For what types do we want to have
readFilefunction returning a value of this type? There are 5 types to consider. - Do we want to have a single
readFile(polymorphic if we select more than one type in (1))? Or multiple functions? If multiple functions, do we want to have one per type or some other grouping? - If we want to have
readFileforText/String, what should its semantic be? - If desired semantic is to assume utf8 encoding, how should it be implemented under the hood?
Similar questions apply to writeFile, I guess we should cover them in this issue as well (even though there is only readFile in the title).
Here are my thoughts:
- For all 5 types or only for strict
TextandByteStringbecause I think they are most commonly used. - I would have
readFileforByteStringandreadFileUtf8forTextandString. I think utf8 assumption should be explicit.readFilecan be made polymorphic to support lazy and strictByteStrings or we can havereadFileandreadFileLazyfor example (but I somewhat dislikereadFileLazybecause it may be not obvious thatLazymeans lazy bytestring here). Or we can havereadFileonly for strict bytestring. Similar forreadFileUtf8: one option is to have it only for strict text, and we want to support lazy text and string, we should make it polymorphic or have some suffix in the name. - I think it's a good idea to assume utf8 encoding as long as it's made explicit, i. e. present in the name of the function or the module where it comes from. In our case we can only do the former, in
with-ut8the latter is done. - One option is described in the blog post:
fmap (decodeUtf8With lenientDecode) . BS.readFile. A different option is implemented inwith-utf8. According to benchmarks from that blog post, the former option actually works better (faster). So I guess it's better to use implemenetation with explicitdecodeUtf8With?