llvm
llvm copied to clipboard
Emit a warning for floating-point size changes after implicit conversions
Use of double precision arithmetic in device code results in reduced FP64 performance in GPU's. https://github.com/intel/llvm/issues/5783 asks that the Clang front end emits a helpful warning stating the above to warn users about double-precision arithmetic usage in the device code.
This PR adds a new warning -Wimplicit-float-size-conversion when there is an implicit conversion between floating point types without any precision loss. This warning is enabled for both device and host code.
@intel/dpcpp-cfe-reviewers Hi, this WIP PR addresses the ask of https://github.com/intel/llvm/issues/5783
I am looking for input for the following cases as well. Do we also need to address the follwing cases?
- Spelling the type out (double, but also, long double?)
- Calling a builtin which returns a double (these don’t have function signatures that would have the type spelled out), like __builtin_fabs
- Default argument promotion in a variadic function call. e.g., void func(int i, ….); func(1, 1.0f); the 1.0f will be promoted to double when passed to func()
I did learn that in OpenCL, double precision is supported by the device, for OpenCL C 3.0 or if the {opencl_c_fp64} feature macro is present. But, It is not clear from the SYCL spec or Clang code base if device code supports variadic calls or builtins.
Can either of you or SYCL language design folks please clarify if cases 2 and 3 are valid for device code and if yes, should we diagnose them as well? Is it sufficient to just address double precision literals in device code which the github issues asks for?
Thank you!
double precision arithmetic
@srividya-sundaram I think you will need to support cases 2 and 3 as well. From your PR description it sounds like we want to diagnose double precision arithmetic in device code due to performance issues. I would assume the same performance issues exist for builtins returning double, etc.
@premanandrao could you please take a look as well?
@bader @elizabethandrews
Thanks for the confirmation. I have now included cases for default argument promotion and builtins returning double in device code.
I see this new addtion of warning is causing a lot of test failures : https://github.com/intel/llvm/runs/7055091095?check_suite_focus=true
Is it ok to fix all those tests? I want to make sure the wordings are okay before I fix them.
Do we want these warnings generated by default? I think it should be opt-in
Do we want these warnings generated by default? I think it should be opt-in
Can you clarify what you mean by opt-in? From the issue description, my understanding is we diagnose double arithmetic usage in device code and emit a warning even if double operations are supported in the target device.
Also, my question was if we were okay with the exact wordings of the diagnostic text :
double precision arithmetic used in device code. reduced FP64 performance expected on GPU's.
Do we want these warnings generated by default? I think it should be opt-in
Can you clarify what you mean by opt-in? From the issue description, my understanding is we diagnose double arithmetic usage in device code and emit a warning even if double operations are supported in the target device.
The issue asks for an opt-in warning. From issue description -
" but it might be helpful to have an opt-in, compile-time diagnostic to warn about double-precision ops in the device kernels. No warning is to be emitted unless the user explicitly requested it. "
Opt-in means you use a flag to enable/disable the warning. By default, the warning should be disabled in this case. Warning should only be generated if corresponding flag is passed.
Also, my question was if we were okay with the exact wordings of the diagnostic text :
double precision arithmetic used in device code. reduced FP64 performance expected on GPU's.
I commented another possibility in an earlier comment.
Do we want these warnings generated by default? I think it should be opt-in
Why do you think they should be opt-in as opposed to opt-out?
Do we want these warnings generated by default? I think it should be opt-in
Why do you think they should be opt-in as opposed to opt-out?
Ah, I saw your later comment quoting the issue description, thanks! I don't understand why they would like that to be opt-in rather than opt-out, but whatever. (My intuition is that users would want to know when their floating-point calculations are going to magically change their precision to be different than what the language would usually provide, but it turns out a diagnostic isn't strictly required for conformance because [basic.types]p12 only requires that double have at least the precision of float.
Do we want these warnings generated by default? I think it should be opt-in
Why do you think they should be opt-in as opposed to opt-out?
Because this doesn't go against the standard as far as I know. Its a performance issue which users may or may not care about. Considering we haven't generated a warning for this up till now and this PR is a result of someone requesting an opt-in warning, I figured it should be opt-in.
@AaronBallman
My intuition is that users would want to know when their floating-point calculations are going to magically change their precision to be different than what the language would usually provide, but it turns out a diagnostic isn't strictly required for conformance because [basic.types]p12 only requires that double have at least the precision of float
Isn't this the case with the OpenCL flag?
For CFE, we are just emitting a warning about performance issues without implicitly casting double-precision floating-point literals to single-precision literals.
Even then, wouldn't users not want to know about this potential performance loss in GPU's? Does it indeed have to be opt-in only? Just a thought.
@AaronBallman
My intuition is that users would want to know when their floating-point calculations are going to magically change their precision to be different than what the language would usually provide, but it turns out a diagnostic isn't strictly required for conformance because [basic.types]p12 only requires that double have at least the precision of float
Isn't this the case with the OpenCL flag?
Ah, I thought we were doing the same thing as OpenCL there, sorry that I was wrong.
For CFE, we are just emitting a warning about performance issues without implicitly casting double-precision floating-point literals to single-precision literals.
Even then, wouldn't users not want to know about this potential performance loss in GPU's? Does it indeed have to be opt-in only? Just a thought.
I was thinking this was not a performance issue but a correctness issue. But as @elizabethandrews points out, it's only about performance and so off-by-default is somewhat more reasonable.
FWIW, I'm skeptical that off-by-default warnings are worth the maintenance burdens. Community experience was that only a very small percentage of people ever opt in to an off-by-default warning (I recall hearing 1-5%, vaguely). So my personal feelings are: if this isn't important enough to warn people about by default, it probably belongs in a separate tool like a profiler, linter, etc.
FWIW, I'm skeptical that off-by-default warnings are worth the maintenance burdens. Community experience was that only a very small percentage of people ever opt in to an off-by-default warning (I recall hearing 1-5%, vaguely). So my personal feelings are: if this isn't important enough to warn people about by default, it probably belongs in a separate tool like a profiler, linter, etc.
I believe the submitter mentioned this can be detected using profiler, but that a warning diagnostic may also be useful. I do not know how to decide whether implementing the warning is worth it or not, since I do not know how community statistics translate over to DPCPP users who care about performance.
The issue comments mention a related CUDA flag --warn-on-double-precision-use. I guess one way to answer the question is to see how widely the CUDA flag is used or talk to someone who is more familiar with these kind of performance flags. Maybe @bader?
FWIW, I'm skeptical that off-by-default warnings are worth the maintenance burdens. Community experience was that only a very small percentage of people ever opt in to an off-by-default warning (I recall hearing 1-5%, vaguely). So my personal feelings are: if this isn't important enough to warn people about by default, it probably belongs in a separate tool like a profiler, linter, etc.
I am not familiar with the usage metrics of opt-in flags in Clang and I understand that the GitHub user only asked for a compile time, opt-in diagnostic and that we dont emit a warning if not explicitly requested by the user.
However, I had a few thoughts:
In most cases, kernels are intended to be run on GPU's.
Case 1: Say a user did not intend to use double arithmetic in device code intentionally but they mistakenly left out "f in 2.0f" as in this example float x2 = x * 2.0;
Now chances are that they may never notice this mistake. The device code when run on GPU might have reduced performance. A warning here might give a clue I guess.
Case 2: The user might intentionally use double arithmetic in device code but is unaware of this option/flag to invoke. This will also result in reduced performance in GPU. Ofcourse they can debug and eventually pass on the flag but just wondering if we must care about these usecases?
FWIW, I'm skeptical that off-by-default warnings are worth the maintenance burdens. Community experience was that only a very small percentage of people ever opt in to an off-by-default warning (I recall hearing 1-5%, vaguely). So my personal feelings are: if this isn't important enough to warn people about by default, it probably belongs in a separate tool like a profiler, linter, etc.
I am not familiar with the usage metrics of opt-in flags in Clang and I understand that the GitHub user only asked for a compile time, opt-in diagnostic and that we dont emit a warning if not explicitly requested by the user.
However, I had a few thoughts:
In most cases, kernels are intended to be run on GPU's.
Case 1: Say a user did not intend to use double arithmetic in device code intentionally but they mistakenly left out "f in 2.0f" as in this example float x2 = x * 2.0;
Now chances are that they may never notice this mistake. The device code when run on GPU might have reduced performance. A warning here might give a clue I guess.
Case 2: The user might intentionally use double arithmetic in device code but is unaware of this option/flag to invoke. This will also result in reduced performance in GPU. Ofcourse they can debug and eventually pass on the flag but just wondering if we must care about these usecases?
To add to these great questions and lines of thinking, to me I think it boils down to: how often will the diagnostic fire in a situation where the user goes "oops, this was a mistake". If the answer is "almost always", it should be an on-by-default frontend warning, and if the answer is "rarely" or "uncommonly", it should not be in the frontend at all.
My intuition is that people writing code for GPUs will typically expect that the larger the datatype, the worse the performance on the GPU. Thus, I'd expect most folks to find such a warning to be rather low-value. However, I don't know if my intuition matches reality.
That said, I think that literals are an interesting case because it is easy to forget one (as @srividya-sundaram pointed out). It might be reasonable to have an on-by-default diagnostic for double literals and maybe a related off-by-default diagnostic for long double literals and direct use of the types. But I'd still wonder how much existing code would trigger the warning and what the false positive rate is for it.
It might be reasonable to have an on-by-default diagnostic for double literals and maybe a related off-by-default diagnostic for long double literals and direct use of the types. But I'd still wonder how much existing code would trigger the warning and what the false positive rate is for it.
This seems like a very reasonable suggestion.
ping @intel/dpcpp-cfe-reviewers
It might be reasonable to have an on-by-default diagnostic for double literals and maybe a related off-by-default diagnostic for long double literals and direct use of the types. But I'd still wonder how much existing code would trigger the warning and what the false positive rate is for it.
This seems like a very reasonable suggestion.
Have you followed up to see if this suggestion works or not? (I'd slightly prefer the changes in one patch rather than a series of patches because they're all the same diagnostic, just in different circumstances.)
Have you followed up to see if this suggestion works or not? (I'd slightly prefer the changes in one patch rather than a series of patches because they're all the same diagnostic, just in different circumstances.)
Agreed with @AaronBallman. I would prefer to try this suggestion on current PR.
@AaronBallman
Have you followed up to see if this suggestion works or not? (I'd slightly prefer the changes in one patch rather than a series of patches because they're all the same diagnostic, just in different circumstances.)
There seems to be a diagnostic which checks if the long double type is allowed to be used for the requested target or not. ( Ref: PR , long-double.cpp)
long double is not supported on spir64 target and the SPIR-V output will not have operations with double LLVM type.
I believe only if the emitted target device code has operations with double LLVM type, there must be a warning. But it is not clear to me which targets support long double type. Even for targets that support long double types, I am not sure if the generated output will contain double LLVM type ( which seemed to cause the exrtra performance overhead).
So emitting a warning for every occurence of long double type in device code whether the target supports it or not seems questionable to me. Please let me know your thoughts.
maybe a related off-by-default diagnostic for long double literals and direct use of the types.
Could you give an example of 'direct use of the types' test case?
@AaronBallman
Have you followed up to see if this suggestion works or not? (I'd slightly prefer the changes in one patch rather than a series of patches because they're all the same diagnostic, just in different circumstances.)
There seems to be a diagnostic which checks if the long double type is allowed to be used for the requested target or not. ( Ref: PR , long-double.cpp)
long doubleis not supported on spir64 target and the SPIR-V output will not have operations with double LLVM type.
Ah, good! Though, our test is only testing for spelling the type long double explicitly and doesn't have any coverage for long double literals like 1.0L (might be worth adding some test coverage for that to be sure).
I believe only if the emitted target device code has operations with double LLVM type, there must be a warning. But it is not clear to me which targets support long double type. Even for targets that support long double types, I am not sure if the generated output will contain double LLVM type ( which seemed to cause the exrtra performance overhead).
The targets have to tell you that information. e.g. getTargetInfo().hasLongDoubleType() tells you if the target has the long double type. I would expect double behaves slightly differently in that the target has a double type, but the TargetInfo object for it should be saying that the width and alignment of that type are the same as for float, and we're just adding some diagnostics explaining that to users rather than silently doing it.
So emitting a warning for every occurence of long double type in device code whether the target supports it or not seems questionable to me. Please let me know your thoughts.
I suspect querying the target should be sufficient for long double.
maybe a related off-by-default diagnostic for long double literals and direct use of the types.
Could you give an example of 'direct use of the types' test case?
Sure!
double d1; // off-by-default warning, the type is explicitly spelled out by the user
auto d2 = 1.0; // on-by-default warning, the literal has type double
auto ld1 = 1.0L; // turns out we want this to be on-by-default error if the target doesn't support long double
long double ld2; // same here as with ld1
So "direct use of type" means "the user wrote 'double' themselves" instead of the type information being slightly more hidden (as with literals).
I believe only if the emitted target device code has operations with double LLVM type, there must be a warning. But it is not clear to me which targets support long double type. Even for targets that support long double types, I am not sure if the generated output will contain double LLVM type ( which seemed to cause the exrtra performance overhead).
The targets have to tell you that information. e.g.
getTargetInfo().hasLongDoubleType()tells you if the target has thelong doubletype. I would expectdoublebehaves slightly differently in that the target has a double type, but theTargetInfoobject for it should be saying that the width and alignment of that type are the same as forfloat, and we're just adding some diagnostics explaining that to users rather than silently doing it.
Okay, I was confused again and am wrong about TargetInfo needing to change; it's fine as-is for the double behavior. Sorry for that!
I think I figured out what bothers me about this approach and why I keep getting confused. SYCL isn't doing anything odd here, there's nothing SYCL-specific about it at all. The issue isn't that users need to know when they've used a double type -- those are valid to use on the device. The issue is that the user needs to know when they're getting conversions from float to double that they didn't expect to get.
Clang already has -Wconversion as an umbrella for conversion warnings, and it has -Wfloat-conversion as a subgroup, but they do not diagnose everything we want because they are silent when the conversion doesn't lose precision. I think we should be adding -Wfloat-width-conversion which diagnoses any conversion from long double to double or double to float, or float to 16-bit float types where the width of the destination type is less than the width of the source type (because sometimes long double is the same width as double, etc). I think this a more general diagnostic that would be plausible to upstream to community and it would solve the concerns in the original report about telling the user about unexpected conversions that will impact performance.
The issue is that the user needs to know when they're getting conversions from float to double that they didn't expect to get.
Sorry, I am a bit confused. Isn't it the other way? That is, the user needs to know when they're getting conversions from double to float that they didn't expect to get.
From the example quoted in the issue:
float x2 = x * 2.0; // Forgot F suffix
SPIR-V output:
%mul_i_i = OpFMul %double %conv2_i_i %double_2
%conv3_i_i = OpFConvert %float %mul_i_i
%conv5_i_i = OpFConvert %double %conv3_i_i
In the generated SPIR-V output, there is a conversion from double(i.e.2.0) to float, while the user intended it to be just "float". This unnecessary conversion from "double" to "float" is causing performance reduction and we should diagnose such instances.
I think we should be adding -Wfloat-width-conversion which diagnoses any conversion from long double to double or double to float, or float to 16-bit float types where the width of the destination type is less than the width of the source type (because sometimes long double is the same width as double, etc)
This matches with my understanding quoted above but I want to make sure that is what you implied.
The issue is that the user needs to know when they're getting conversions from float to double that they didn't expect to get.
Sorry, I am a bit confused. Isn't it the other way? That is, the user needs to know when they're getting conversions from double to float that they didn't expect to get.
Yeah, I was pretty brain dead when I wrote that comment, I was backwards. Sorry for that!
From the example quoted in the issue:
float x2 = x * 2.0; // Forgot F suffixSPIR-V output:
%mul_i_i = OpFMul %double %conv2_i_i %double_2 %conv3_i_i = OpFConvert %float %mul_i_i %conv5_i_i = OpFConvert %double %conv3_i_iIn the generated SPIR-V output, there is a conversion from double(i.e.2.0) to float, while the user intended it to be just "float". This unnecessary conversion from "double" to "float" is causing performance reduction and we should diagnose such instances.
Agreed!
I think we should be adding -Wfloat-width-conversion which diagnoses any conversion from long double to double or double to float, or float to 16-bit float types where the width of the destination type is less than the width of the source type (because sometimes long double is the same width as double, etc)
This matches with my understanding quoted above but I want to make sure that is what you implied.
Yes, this is what I was aiming for. The problem with the existing float conversion warning is that it only triggers where there's potential for a loss of precision, which is slightly different than what we're after here. We don't care about the loss of precision, we care about the fact that there are implicit conversions required and the user may not realize those conversions are happening. (One thing you should think about for this warning is that the presence of an explicit cast should probably silence the warning on the assumption the user knows what they're doing.)
ping @intel/dpcpp-cfe-reviewers
@intel/dpcpp-cfe-reviewers I am in the process of upstreaming this change. Can this PR be merged if approved or does it have to wait until its upstreamed? Also, [Linux / HIP AMDGPU LLVM Test Suite] failure is unrelated to the changes in this PR.
@intel/dpcpp-cfe-reviewers I am in the process of upstreaming this change. Can this PR be merged if approved or does it have to wait until its upstreamed? Also, [Linux / HIP AMDGPU LLVM Test Suite] failure is unrelated to the changes in this PR.
Personally, I am fine with this being merged into sycl while we are in the process of upstreaming it. This has been fairly well reviewed and there is immediate benefit to SYCL users to having this diagnostic in. But if the other reviewers feel that it should be upstreamed first, I am okay with that too. Are you (@srividya-sundaram) okay with needing to redo some of this here if the community suggests changes different to what you have implemented here?
@intel/dpcpp-cfe-reviewers I am in the process of upstreaming this change. Can this PR be merged if approved or does it have to wait until its upstreamed? Also, [Linux / HIP AMDGPU LLVM Test Suite] failure is unrelated to the changes in this PR.
I'm okay with moving forward with this patch without upstreaming efforts. I'd actually like to see how our users react to this diagnostic before pushing it upstream because of how much existing diagnostic coverage already exists. Given that upstream is quite reluctant to adding new off by default warnings, I'd like to see evidence that this gets enabled and provides enough value before trying to add the somewhat complicated logic for the diagnostic upstream.
Personally, I am fine with this being merged into sycl while we are in the process of upstreaming it. This has been fairly well reviewed and there is immediate benefit to SYCL users to having this diagnostic in
Agreed. I am OK with it.
@intel/dpcpp-cfe-reviewers I am in the process of upstreaming this change. Can this PR be merged if approved or does it have to wait until its upstreamed? Also, [Linux / HIP AMDGPU LLVM Test Suite] failure is unrelated to the changes in this PR.
I'm ok with it. You may need to modify this patch in the future, based on community review comments though.
I'm okay with moving forward with this patch without upstreaming efforts. I'd actually like to see how our users react to this diagnostic before pushing it upstream because of how much existing diagnostic coverage already exists. Given that upstream is quite reluctant to adding new off by default warnings, I'd like to see evidence that this gets enabled and provides enough value before trying to add the somewhat complicated logic for the diagnostic upstream.
@AaronBallman please correct me if I am mistaken but I'm reading this as - don't upstream without evidence of usage. Should @srividya-sundaram not attempt to upstream it now then? Also, what would the evidence be? I'm not sure how easy/hard it is to get usage metrics from our users.