v4.0 input validation philosophy

Open ajfriend opened this issue 3 years ago • 0 comments

As we're reworking the library for 4.0, should h3-py validate all user inputs by default? Or should we assume that the user knows what they're doing and skip the validation? (The current v3 h3-py library is fairly aggressive with validation.)

The former option could help new users by catching bugs early, but at the (potential) cost of performance (which might be especially annoying if they've already validated their inputs).

The latter option simplifies the h3-py library code and (potentially) provides performance benefits, at the risk of returning junk outputs when given invalid inputs. Note that we would still need to do the minimal validation required to prevent segfaults, for example.

There are probably lots of interesting ways we could do this, especially if we're exposing multiple APIs (str/int/numpy/memview and Python/Cython/C). For example, we could provide "safe" and "unsafe" versions of the APIs that differ in how much validation they do.

One concern: We probably want to avoid a combinatorial explosion of APIs. Put differently: Is there a way to provide flexibility around validation aggressiveness without making things too complex / getting into configuration hell?

One idea: Have the default Python APIs do lots of validation, which could be friendly to new users. For the users that really care about performance, we expose the Cython and C APIs, so they're free to do whatever they'd like. (Although, I'm not sure how user-friendly this would be for the performance-minded users.)

Overall, I think this is a really interesting user experience design problem, and I'd be super curious to hear more thoughts/concerns/suggestions/proposals/warnings/ideas/solutions from as many folks as possible.

Jul 02 '22 08:07 ajfriend