geotrellis
geotrellis copied to clipboard
ValueReaders Should Return Option[K]
Right now, whenever a ValueReader
fails to read a given Tile it throws an error. However, this is different from what LayerReader
s do which is just to return an empty layer if the given query returned nothing. We should change the return type of ValueReader.read
to Option[K]
so that it doesn't throw when it can't find the Tile.
FWIW, here are my thoughts about error handling for multi-stage parsing or processing to derive a desired value.
TL;DR: Consider Validation
.
My current viewpoint:
- Checked exceptions suck
- Undeclared unchecked exceptions are just malicious
-
Option[T]
is lossy -
Either[T, Throwable]
can be constraining (no error accumulation), and assumes non-parallel validation 4a)Either[T, Seq[Throwable]]
orEither[T, Nel[Throwable]]
provides accumulation, but you have to work out the compositional semantics yourself. -
Either[Throwable, T]
points out the problem withEither
... I can't remember the convention of which side is the "good" side. -
Try[T]
is a bit better (can be transformed into anOption
orEither
), but frankly I find the default combinators hard to work with (e.g.recover
vsrecoverWith
vsfold
...). I still use it quite a lot though. -
Validation
orValidationNel
from scalaz/cats are probably solving the problem "correctly", but every time I start to use them I feel like they are over engineering the problem in the name of some abstract "purity". And they need to be used consistently to have real value.
What I do, vs what I should do:
- When I'm not writing API or application code (throwaway stuff), I do 2.
- When I'm really lazy I do 3. It almost never sticks, and then I consider the next items.
- I consider 4/5, start thinking about what the semantics should be, get bored, and move on to 6.
- 6 doesn't solve the error accumulation problem, but I like it better than
Either
for some reason. I use this the most despite always having to look up how I should do error recovery - I've only fully committed to 7 a couple of times in some small applications that never saw the light of day. The first thing I thrash over is the commitment to scalaz or cats, and then which versions, and then the continual leakage of other types that result. It feels like a big commitment and potentially a divisive one. Then I get frustrated that the Scala standard library doesn't have it built in, saving me from having to make such a value decision, then I consider writing my own that has no dependencies, but know that's just stupid, and then I either go back to 6; or I go with cats, because, well, the cats folk are just nicer, and don't require me to declare cultural alliance by choosing "zee" over "zed".... Then, after all that drama I start to think I should go back through all of my API and replace any usage of
Option
/Either
/Try
withValidation
and overwhelm myself with the work involved, often resulting in going back to 6 because, well it's the devil I know.
In summary, if I were writing something new, or willing to commit to reworking all the code to be consistent, some form of Validation
is probably the right answer. But consistency is key if you want to get the benefits of composition, otherwise you're increasing cognitive surface area with no benefit. Otherwise, I'd go with Try
.
Thanks @metasim for your quite expanded response. I'm thinking more towards MonadError
or smth like that, in order not to be locked on default combinators.
Thank you, @metasim you raised some good points. @pomadchin Would we want to implement something like MonadBlunder
from this article? I'm not familiar enough with cats
to know if they have something like this now, though. What are your thoughts @moradology @jpolchlo @jamesmcclain ?
I think there's another option so far not considered (save obliquely in the context of MonadError): cats.effect.IO
. IO
would allow us to completely defer the question of returning Either[Throwable, Result]
vs throwing. For example:
val maybeExceptiony: IO[Int] = IO { myIntReader.read("abc123") }
val noExceptionsForMe: IO[Either[Throwable, Int]] = maybeExceptiony.attempt
In general, I think this solution is the right approach for code which depends heavily upon stuff happening outside of it (exception states will abound regardless of the stuff we write) and it strikes a nice compromise position that isn't too heavy-handed about how your call site is going to look.
The main issue I see with the use of Validation
is that we don't have a lot of independently checkable steps so much as a chain of (possibly failing) behaviors that are strictly dependent upon prior results - we can't really take full advantage of the Applicative avoidance of the Monad's short-circuiting behavior
My vote would be for IO
, looks like a great compromise, also would give an async API for free (if we'd like to make it async).
Reference to a good PR by @moradology that demonstrates how we can abstract over IO
: https://github.com/locationtech/geotrellis/pull/2818
IMO, Either
is too unopinionated for this use case. I think Validation
is better because it's in-your-face about reporting errors. Something like Rust's Return
is also a consideration, which is like Either
except that there's no Left
/Right
ambiguity. For those writing the API it may not seem like a big deal to just get used to Either
being left or right biased, but to library users it can cause a lot of cognitive friction and eye squinting.
PS: I'm assuming that the general GeoTrellis target user is less adept at type gymnastics than an API developer (i.e. users are in levels A1-3). If that's not the goal with GeoTrellis, I guess you should just:
@metasim we hope that the abstraction would allow users to choose what they want to use to handle effects, that can be IO
, this can be Future
/ twitter Future
/ Either
/ Validated
/ whatever
. Also, talking in particular exceptions handling in IO
: https://github.com/typelevel/cats-effect/issues/67#issuecomment-410913527
Also don't forget that IO
gives us abilities to parallelize for free.
But i agree that nothing extra complicated should be exposed into the API. It looks like the choose of exception handling mechanism should be as much as it's possible on the user side.
Mb you just need to try some more cats
;)
P.S. Validated
is not a monad; so you can't flatMap
it; wondering how to combine methods with such a return type.
Mb you just need to try some more cats ;)
For me I'm fine with cats (I do need to use it more). I just know that not all users (or at least everyone on my team) is comfortable with it, and I would prefer to be able to promote the use of GeoTrellis without a bunch of "but first you must learn this" caveats. So I have two threads of thoughts here: 1) how we manage errors internally in the API implementation (where I have less opinion as long as it's consistent), and 2) externally, at the user-facing surface area of the API, where I think we should strive for minimizing friction without losing flexibility. Cats may be the right answer in both cases, but IMO we just need to be thoughtful, deliberate and consistent about it. Even the term "monad" is still scary to a lot of devs, and seeing it as a part of an API can hamper adoption (I'm thinking of the large number of Java refugees coming to Spark/Scala).
@metasim I think you hit the nail on the head there, the result is going to be different tiers of API intended for different goals. The distinction somewhat exists right now but looks like we'll have to much more formal and deliberate about it in the future.
https://github.com/locationtech/geotrellis/pull/2818 is a pretty example of driving cause: There is a realization and a need that performing operations makes sense in at least three contexts:
- local
- local async
- streaming async
- batch (which may have async parts)
Rewriting code for each case is tremendously expensive and hard to maintain so we find ourselves reaching for higher order abstractions. At the same time leaking complexity (even through types) is bad practices and when rubber hits the road you're trying to do concrete thing A
to thing B
.
I think as we push we're going to end up with three levels of API:
- Java-like API around performance and IO edges
- IO code by default is performance centric and least type-full anyway.
- Basic for most extensions through SPI or just use provided types
- Async API around modularity and co-ordination
- Capture general patterns of operation that can be re-used in multiple contexts
- Notebook/integration API
- Something that cobbles previous level for specific system (Spark vs FS2)
- Probably something pretty specific about the types, or at least much constrained types.
- Simple enough that you would be able to write it in a notebook.
IO[Either[Throwable, Int]]
might make total sense if you're integrating with http4s
, doobie
or fs2
but would be really out of place in Spark API.
I hope we can start using cats
in the core where it has low impact on users at first and check ourselves when it starts leaking out.
@echeipesh Great followup.
Java-like API around performance and IO edges
^---- Would also make Python bindings a heck of a lot easier.
PS: I hope I've not come across as anti-cats
, fs2
, IO
or whatever.... I think all that performance heavy stuff should use whatever abstraction lends to the best 1) maintainability, and 2) performance. It's more, as Eugene said, me being fussy about also thinking about the same problem from the outside in, and what kind of type leakage to the end user is acceptable.