llvm-project
llvm-project copied to clipboard
inconsistent argument attributes in calls to similar library functions
This issue was prompted by a discussion in D129224 of applying attributes to arguments of calls to known library functions. It shows that the middle end doesn't apply such attributes entirely consistently which, at a minimum, can be confusing. Separately, it also shows that the middle end doesn't optimize all three functions in the test case equivalently. Compiling the following test case with opt -O2
declare i64 @atol(i8*)
declare i64 @strlen(i8*)
declare i64 @strtol(i8*, i8**, i32)
define i64 @f1(i8* %p, i8* %pd) {
%t0 = load i8, i8* %p
%n = call i64 @atol(i8* %p)
%t1 = load i8, i8* %p
%d = sub i8 %t1, %t0
store i8 %d, i8* %pd
ret i64 %n
}
define i64 @f2(i8* %p, i8* %pd) {
%t0 = load i8, i8* %p
%n = call i64 @strlen(i8* %p)
%t1 = load i8, i8* %p
%d = sub i8 %t1, %t0
store i8 %d, i8* %pd
ret i64 %n
}
define i64 @f3(i8* %p, i8* %pd) {
%t0 = load i8, i8* %p
%n = call i64 @strtol(i8* %p, i8** null, i32 0)
%t1 = load i8, i8* %p
%d = sub i8 %t1, %t0
store i8 %d, i8* %pd
ret i64 %n
}
produces the output below. In it the attributes applied to the first argument at each call site are different even though all three functions have the same access semantics (they all read a string pointed by the first argument, up to the terminating nul), as are the attributes applied to the first argument in the declaration of strtol
and those in atol
and strlen
.
; ModuleID = 'a.ll'
source_filename = "a.ll"
; Function Attrs: mustprogress nofree nounwind readonly willreturn
declare i64 @atol(i8* nocapture) local_unnamed_addr #0
; Function Attrs: argmemonly mustprogress nofree nounwind readonly willreturn
declare i64 @strlen(i8* nocapture) local_unnamed_addr #1
; Function Attrs: mustprogress nofree nounwind willreturn
declare i64 @strtol(i8* readonly, i8** nocapture, i32) local_unnamed_addr #2
; Function Attrs: mustprogress nofree nounwind willreturn
define i64 @f1(i8* nocapture readonly %p, i8* nocapture writeonly %pd) local_unnamed_addr #2 {
%n = tail call i64 @atol(i8* nocapture %p)
store i8 0, i8* %pd, align 1
ret i64 %n
}
; Function Attrs: argmemonly mustprogress nofree nounwind willreturn
define i64 @f2(i8* nocapture readonly %p, i8* nocapture writeonly %pd) local_unnamed_addr #3 {
%n = tail call i64 @strlen(i8* noundef nonnull dereferenceable(1) %p)
store i8 0, i8* %pd, align 1
ret i64 %n
}
; Function Attrs: mustprogress nofree nounwind willreturn
define i64 @f3(i8* nocapture readonly %p, i8* nocapture writeonly %pd) local_unnamed_addr #2 {
%t0 = load i8, i8* %p, align 1
%n = tail call i64 @strtol(i8* nocapture nonnull %p, i8** null, i32 0)
%t1 = load i8, i8* %p, align 1
%d = sub i8 %t1, %t0
store i8 %d, i8* %pd, align 1
ret i64 %n
}
attributes #0 = { mustprogress nofree nounwind readonly willreturn }
attributes #1 = { argmemonly mustprogress nofree nounwind readonly willreturn }
attributes #2 = { mustprogress nofree nounwind willreturn }
attributes #3 = { argmemonly mustprogress nofree nounwind willreturn }
I would expect the first argument in each of the three calls to be annotated with the same attributes: nocapture nonnull noundef dereferenceable(1)
. In the strlen
case nocapture
might be implied by argmemonly
but that's both subtle and (as far as I can see) not documented in the manual.
I would also expect all three functions to be optimized equivalently, and in particular, f1
to the same IL as f3
(even though f3
might set errno
and f1
need not).
I would expect the first argument in each of the three calls to be annotated with the same attributes: nocapture nonnull noundef dereferenceable(1).
The nonnull/noundef/dereferenceable annotations are sometimes missing due to missing annotateNonNullNoUndefBasedOnAccess() calls, e.g. on atol. For many of these (where an unconditional access occurs) we could move the noundef/dereferenceable inferral into inferNonMandatoryLibFuncAttrs(). However, nonnull depends on the null_pointer_is_valid attribute in the caller. I assume that's the reason why we currently handle even the "unconditional" cases in SLC, because they are actually conditional on the attributes of the caller.
In the strlen case nocapture might be implied by argmemonly but that's both subtle and (as far as I can see) not documented in the manual.
nocapture is not implied by argmemonly. The strlen in your IR has an explicit nocapture attribute here:
declare i64 @strlen(i8* nocapture) local_unnamed_addr #1
I would also expect all three functions to be optimized equivalently, and in particular, f1 to the same IL as f3 (even though f3 might set errno and f1 need not).
I don't think that would be legal (from LLVM's perspective) if the strtol
argument just happens to point to errno
. There's probably something in the C standard that precludes that?
Okay, so to start let me look at adding the missing annotateNonNullNoUndefBasedOnAccess
calls and take it from there.
I understand that some of the attributes (like nocpature
on the strlen
call) are taken from the declaration of the function (which is modified by the middle end) but that's easily missed, especially in tests where the declarations often have no attributes. I must have even overlooked it in the snippet above. There are just too many attributes all over the place. It would be nice to make this more obvious somehow. One way would be to mention all the applicable attributes in each call. With lots of attributes I suppose that could make the output cluttered enough to defeat the purpose. I guess one way to find out is to try it and see (unless it's already been done and rule out).
The pointer arguments to strtol
are both restrict
-qualified which implies that the function can access the object they point to only through them (or others derived from them) but not by other means.