universum icon indicating copy to clipboard operation
universum copied to clipboard

Make readFile overloaded

Open int-index opened this issue 7 years ago • 14 comments

To read Text (lazy and strict), ByteString (lazy and strict), or even String.

int-index avatar Nov 03 '17 02:11 int-index

Would be nice to overload all functions from https://hackage.haskell.org/package/universum-0.7.1/docs/Lifted-File.html because at present some of them accept strict Text and others accept lazy Text, and it seems confusing

Martoon-00 avatar Nov 03 '17 08:11 Martoon-00

@int-index @Martoon-00 Overloading, especially with IsString has bigger impact to resolving ambiguity. After that change it's no longer possible to write some simple code without -XTypeApplications. Backpack seems like better solution for this problem. Another solution is to use different suffixes like getContentsT or getContentsLT or getContentsB.

chshersh avatar Nov 03 '17 08:11 chshersh

Backpack seems like better solution for this problem.

How would it help, exactly?

int-index avatar Nov 03 '17 13:11 int-index

@int-index Okay, after thinking a little bit more I understood that it can't help. You want to be able to use both versions of readFile: to Text and to ByteString while with Backpack you can use only one inside target of your project.

chshersh avatar Nov 03 '17 14:11 chshersh

while with Backpack you can use only one inside target of your project.

With Backpack you could use all versions, but imported from different instantiations. Basically, you'd get the same situation as today, with importing a separate version for each type.

int-index avatar Nov 03 '17 15:11 int-index

Well, we can overload every function like it's done in Print module: type class per one function. Or we can overload all functions from Lifted.File module with one type class. Dunno about name for this type class though... And I don't know whether it's a good idea or not, will it have impact on performance or not, will it introduce more problems in type checking (because after this change you might end up using a lot of @Text or @ByteString in your code, and you probably don't want to do this because Text should be enough in most cases; only some parsing libraries (like aeson) need ByteString as input).

chshersh avatar Nov 25 '17 12:11 chshersh

Polymorphic writeFille and readFile is a nice thing to have, though it's not critical at all. Sometimes it's actually more convenient to know the particular argument type -- e.g. BSL.writeFile instead of generalized writeFile.

volhovm avatar Feb 07 '18 16:02 volhovm

The more I write code the more I want polymorphic readFile and writeFile functions... Usually I need to read ByteString when I further process it with some encode function: aeson works with lazy bytestring, yaml works with strict bytestring. You can't remember this. And sometimes this results in minor inconvenience and distraction from idea.

chshersh avatar Apr 09 '18 15:04 chshersh

On the other hand, you better think again and use BSL everywhere to avoid extra memory overhead, right?

volhovm avatar Apr 09 '18 15:04 volhovm

@volhovm Well, with yaml package I have to use strict ByteString. Though, I like the suggestion to not use yaml!

chshersh avatar Apr 09 '18 17:04 chshersh

https://www.snoyman.com/blog/2016/12/beware-of-readfile

gromakovsky avatar Apr 16 '18 18:04 gromakovsky

So I vote for making it return m ByteString. Maybe overload it to return strict or lazy ByteString, but not Text or String.

gromakovsky avatar Aug 07 '18 22:08 gromakovsky

https://www.snoyman.com/blog/2016/12/beware-of-readfile

IIUC, the encoding issues described here can be solved by employing the same strategy Kirill Elagin used in the with-utf8 package (essentially, use hSetEncoding <handle> utf8 under the hood).

@gromakovsky Would you be for supporting strict/lazy Text and String using this approach?

dcastro avatar Sep 10 '22 14:09 dcastro

I think there are a few open questions:

  1. For what types do we want to have readFile function returning a value of this type? There are 5 types to consider.
  2. Do we want to have a single readFile (polymorphic if we select more than one type in (1))? Or multiple functions? If multiple functions, do we want to have one per type or some other grouping?
  3. If we want to have readFile for Text/String, what should its semantic be?
  4. If desired semantic is to assume utf8 encoding, how should it be implemented under the hood?

Similar questions apply to writeFile, I guess we should cover them in this issue as well (even though there is only readFile in the title).

Here are my thoughts:

  1. For all 5 types or only for strict Text and ByteString because I think they are most commonly used.
  2. I would have readFile for ByteString and readFileUtf8 for Text and String. I think utf8 assumption should be explicit. readFile can be made polymorphic to support lazy and strict ByteStrings or we can have readFile and readFileLazy for example (but I somewhat dislike readFileLazy because it may be not obvious that Lazy means lazy bytestring here). Or we can have readFile only for strict bytestring. Similar for readFileUtf8: one option is to have it only for strict text, and we want to support lazy text and string, we should make it polymorphic or have some suffix in the name.
  3. I think it's a good idea to assume utf8 encoding as long as it's made explicit, i. e. present in the name of the function or the module where it comes from. In our case we can only do the former, in with-ut8 the latter is done.
  4. One option is described in the blog post: fmap (decodeUtf8With lenientDecode) . BS.readFile. A different option is implemented in with-utf8. According to benchmarks from that blog post, the former option actually works better (faster). So I guess it's better to use implemenetation with explicit decodeUtf8With?

gromakovsky avatar Sep 15 '22 06:09 gromakovsky