loopy icon indicating copy to clipboard operation
loopy copied to clipboard

Add more opencl math builtins

Open zachjweiner opened this issue 4 years ago • 21 comments

I added all the basic functions I could find via regex matching that shouldn't require new logic (so this isn't by any means the complete set of functions listed here). I added a rudimentary test to just ensure that all listed functions resolve and execute.

I could not discern any way that the special-cased logic for fmin, fmax, atan2, and copysign differentiated from that applied to _CL_SIMPLE_MULTI_ARG_FUNCTIONS (when I was trying to determine where to place added multi-arg functions). 7eba8de removes this branch - let me know if I should revert, etc.

I am confused by the branch for pow - it seems to generate powf32 or powf64 based on the type, each of which are defined in the preamble to alias to the built-in pow. Could this possibly be removed? I ask because I have powr just directly resolving to powr itself and couldn't think of a reason to do otherwise.

Closes #437.

zachjweiner avatar Jun 22 '21 20:06 zachjweiner

Thanks for working on this!

I am confused by the branch for pow - it seems to generate powf32 or powf64 based on the type, each of which are defined in the preamble to alias to the built-in pow. Could this possibly be removed? I ask because I have powr just directly resolving to powr itself and couldn't think of a reason to do otherwise.

I think the reason was that calling pow(float, double) results in an error, while calling powf64 would automatically upcast. But with the common-dtype finding there, I don't think it's an issue any more, and I think this can be removed. One thing I'd like to have is testing for mixed dtypes, to make sure this works as intended.

inducer avatar Jun 28 '21 04:06 inducer

One thing I'd like to have is testing for mixed dtypes, to make sure this works as intended.

The intention being to raise when mixed dtypes are passed, or to cast the arguments to the proper dtype in CL code? (I'm guessing the latter - so should we have the CL math functions cast their arguments in all cases?)

zachjweiner avatar Jun 28 '21 13:06 zachjweiner

The pytato failure was just a spurious CondaHTTPError, but we can just wait for the next pushed commit.

zachjweiner avatar Jun 28 '21 16:06 zachjweiner

@inducer, I implemented type casting in ExpressionToOpenCLCExpressionMapper and added a test (and fixed two small bugs I encountered along the way).

zachjweiner avatar Jun 28 '21 17:06 zachjweiner

The failures are

  1. due to ExpressionToCExpressionMapper.map_power not propagating the required types for the arguments to pow. I've fixed this (not yet pushed).
  2. string-match failures (from newly-introduced casting, e.g. in a doctest)
  3. and, most importantly: I think code generation misspecifies LHS types for Lookups. Specifically, generate_assignment_instruction_code asks for the aggregate's dtype, which, as in test_struct_assignment, tries to cast the RHS to the struct dtype. ~I'm yet unsure how to appropriately modify get_var_descriptor---naively passing it the Lookup itself doesn't work.~ nvm, there's no need, generate_assignment_instruction_code can just ask for it after

Also, (2) (and (3), I guess) pose a question: should this PR apply type casting so broadly? Before, type casting was almost entirely left to be implicit (so far as I can tell). But the ways I can think of to tell the ExpressionToOpenCLCExpressionMapper to only apply casting for function arguments feel super clunky.

zachjweiner avatar Jun 28 '21 21:06 zachjweiner

I'm not in principle against making casts explicit in the code, with the caveat that I'd like the SNR of the generated code to remain reasonably high. If the code gets overwhelmed by casts, it's no longer useful to look at.

The other caveat is cost: I would hope that implicit casts and explicit casts have the same cost.

Does your code insert casts in places where you feel they're "unreasonable" to insert?

inducer avatar Jun 28 '21 21:06 inducer

Just pushed one more fix for pow that should resolve everything but the string matching ones.

Does your code insert casts in places where you feel they're "unreasonable" to insert?

I'll report back!

zachjweiner avatar Jun 28 '21 22:06 zachjweiner

Does your code insert casts in places where you feel they're "unreasonable" to insert?

I looked at all instances of casting within test_loopy and test_target. While none of the output code looks obnoxious or unreadable (there are only 52 and 34 instances amongst the tests in each respective file), the common patterns are:

  • casting literals---even things like (float)(1.0f)---which is silly
  • assignments like int i = 0; float x = (float)(i), which is less offensive but still unnecessary
  • casting conditions inside ternaries to char (shouldn't type inference ask for a bool?)

(Also, there's one instance of int j = (int)(i / 256).)

If we want to tell wrap_in_typecast to skip these, the obstacle is that wrap_in_typecast doesn't know whether its input is, e.g., an argument to a binary function (even literals may need to be cast here). One would want the mapper methods to make the decision about how to handle child nodes.

zachjweiner avatar Jun 29 '21 14:06 zachjweiner

Thanks for taking the time to describe the instances of casting. While some aren't ideal, I can live with all of them.

casting conditions inside ternaries to char (shouldn't type inference ask for a bool?)

bool is an odd duck in CL, in that sizeof(bool) is unspecified. That makes it a bit of a hot potato for the type system to deal with, and it just sidesteps the problem in the way you saw.

inducer avatar Jul 01 '21 06:07 inducer

Cool. 6c896ef should fix the remaining failures. Sorry for yet again making the tutorial more verbose... if such prominent and slightly embarrassing casting is a bother, I am happy to refine things now. Otherwise, feel free to ping me if you ever want to revisit this.

bool is an odd duck in CL

Thanks for explaining!

zachjweiner avatar Jul 01 '21 13:07 zachjweiner

It seems that, by promoting host-side device scalars to array args, #453 now prompts the executor to set _lpy_encountered_numpy to True for host scalars. The test added here ran into that, fixed in 93a3399.

zachjweiner avatar Jul 01 '21 14:07 zachjweiner

For some reason I can't reply directly to the comment about map_power: yeah, I had tried clbl.name_in_target first (it's also just "pow").

I observe that seen_functions are only used in preamble generation - they don't affect the callables table. I tried patching CallablesResolver to record having seen pow without transforming it into an actual ResolvedFunction call (otherwise powers with integer-typed arguments will map to calls to pow and fail). But then it won't land in the callables table 😣 So I give up. I did remove the bits adding pow to seen_functions for the non-integers case where it's unneeded in f5116aff.

zachjweiner avatar Jul 02 '21 16:07 zachjweiner

Prompted by flake8 complaining about an unused variable, in b4e76d2 I (thought I) corrected the argument types specified to the generated integer power function. This led to the pytato failure. But I don't imagine either the current implementation nor my change are consistent with numpy's type promotion, right? Should the input types be chosen in the same way as, e.g., other OpenCLCallables, via numpy.find_common_type and the like? (Not to append yet another change to this PR...)

zachjweiner avatar Jul 02 '21 19:07 zachjweiner

Should the input types be chosen in the same way as, e.g., other OpenCLCallables, via numpy.find_common_type and the like? (Not to append yet another change to this PR...)

I'd say so... but isn't that what your code is currently doing?

Also, any idea what's causing the (seeming) integer overflows in pytato?

inducer avatar Jul 05 '21 21:07 inducer

I'd say so... but isn't that what your code is currently doing?

In OpenCLCallable.emit_types, yes, but int_pow doesn't go that route. (I wasn't sure if the current casting strategy was some intentional integer arithmetic hackery.) I pushed 2a961eb which implements the change I proposed and resolves the pytato failure (which I realized was caused by the same casting issue after my comment last week). I think the old implementation would have failed on integer powers whose result is larger than the max value of the base's type. (numpy promotes the args of things like <uint64> * <int8> to float64 to avoid overflow.)

zachjweiner avatar Jul 05 '21 21:07 zachjweiner

Just a conda connection reset error... strangely not the first time today on the "without arg check" job. Mind rerunning it?

zachjweiner avatar Jul 06 '21 17:07 zachjweiner

Invited you to the repo, so that you can do that yourself. :)

inducer avatar Jul 06 '21 18:07 inducer

I seem to have very bad luck with it, so thanks =D

zachjweiner avatar Jul 06 '21 18:07 zachjweiner

Hi @inducer, just wanted to ping this - not to rush or anything - but just to clarify that I believe everything we've discussed has been addressed and this is ready for review.

(Also, just FYI, the CI job that runs twice to test caching seems to fairly consistently hit conda HTTP errors on the first attempt triggered by pushes.)

zachjweiner avatar Jul 14 '21 13:07 zachjweiner

Thanks for working on this. I see you've taken the approach of pushing the bool -> int8 mapping "up" into type inference. That really worries me, because now our type system is "blind" to Booleans (depending on which target is in use). I'd really like to avoid that, it seems like a liability. Could you explain what drove this decision? Do you see alternatives?

inducer avatar Jul 16 '21 17:07 inducer

Point taken re: type inference being blind to true Booleans. I guess I'm unsure how else we would use the target to inform type inference such that the casts to char aren't emitted (if I'm understanding your comment here - maybe you had something else in mind?).

To me this feels like another argument that type casting should be "context-aware" (i.e., informed by the parent expression), on top of wanting to cast literals only inside function calls. The alternatives I can imagine require similarly invasive modifications to the expression-to-C/CL mappers.

zachjweiner avatar Jul 19 '21 14:07 zachjweiner