llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

inconsistent argument attributes in calls to similar library functions

Open msebor opened this issue 2 years ago • 2 comments

This issue was prompted by a discussion in D129224 of applying attributes to arguments of calls to known library functions. It shows that the middle end doesn't apply such attributes entirely consistently which, at a minimum, can be confusing. Separately, it also shows that the middle end doesn't optimize all three functions in the test case equivalently. Compiling the following test case with opt -O2

declare i64 @atol(i8*)
declare i64 @strlen(i8*)
declare i64 @strtol(i8*, i8**, i32)

define i64 @f1(i8* %p, i8* %pd) {
  %t0 = load i8, i8* %p
  %n = call i64 @atol(i8* %p)
  %t1 = load i8, i8* %p
  %d = sub i8 %t1, %t0
  store i8 %d, i8* %pd
  ret i64 %n
}

define i64 @f2(i8* %p, i8* %pd) {
  %t0 = load i8, i8* %p
  %n = call i64 @strlen(i8* %p)
  %t1 = load i8, i8* %p
  %d = sub i8 %t1, %t0
  store i8 %d, i8* %pd
  ret i64 %n
}

define i64 @f3(i8* %p, i8* %pd) {
  %t0 = load i8, i8* %p
  %n = call i64 @strtol(i8* %p, i8** null, i32 0)
  %t1 = load i8, i8* %p
  %d = sub i8 %t1, %t0
  store i8 %d, i8* %pd
  ret i64 %n
}

produces the output below. In it the attributes applied to the first argument at each call site are different even though all three functions have the same access semantics (they all read a string pointed by the first argument, up to the terminating nul), as are the attributes applied to the first argument in the declaration of strtol and those in atol and strlen.

; ModuleID = 'a.ll'
source_filename = "a.ll"

; Function Attrs: mustprogress nofree nounwind readonly willreturn
declare i64 @atol(i8* nocapture) local_unnamed_addr #0

; Function Attrs: argmemonly mustprogress nofree nounwind readonly willreturn
declare i64 @strlen(i8* nocapture) local_unnamed_addr #1

; Function Attrs: mustprogress nofree nounwind willreturn
declare i64 @strtol(i8* readonly, i8** nocapture, i32) local_unnamed_addr #2

; Function Attrs: mustprogress nofree nounwind willreturn
define i64 @f1(i8* nocapture readonly %p, i8* nocapture writeonly %pd) local_unnamed_addr #2 {
  %n = tail call i64 @atol(i8* nocapture %p)
  store i8 0, i8* %pd, align 1
  ret i64 %n
}

; Function Attrs: argmemonly mustprogress nofree nounwind willreturn
define i64 @f2(i8* nocapture readonly %p, i8* nocapture writeonly %pd) local_unnamed_addr #3 {
  %n = tail call i64 @strlen(i8* noundef nonnull dereferenceable(1) %p)
  store i8 0, i8* %pd, align 1
  ret i64 %n
}

; Function Attrs: mustprogress nofree nounwind willreturn
define i64 @f3(i8* nocapture readonly %p, i8* nocapture writeonly %pd) local_unnamed_addr #2 {
  %t0 = load i8, i8* %p, align 1
  %n = tail call i64 @strtol(i8* nocapture nonnull %p, i8** null, i32 0)
  %t1 = load i8, i8* %p, align 1
  %d = sub i8 %t1, %t0
  store i8 %d, i8* %pd, align 1
  ret i64 %n
}

attributes #0 = { mustprogress nofree nounwind readonly willreturn }
attributes #1 = { argmemonly mustprogress nofree nounwind readonly willreturn }
attributes #2 = { mustprogress nofree nounwind willreturn }
attributes #3 = { argmemonly mustprogress nofree nounwind willreturn }

I would expect the first argument in each of the three calls to be annotated with the same attributes: nocapture nonnull noundef dereferenceable(1). In the strlen case nocapture might be implied by argmemonly but that's both subtle and (as far as I can see) not documented in the manual.

I would also expect all three functions to be optimized equivalently, and in particular, f1 to the same IL as f3 (even though f3 might set errno and f1 need not).

msebor avatar Jul 26 '22 22:07 msebor

I would expect the first argument in each of the three calls to be annotated with the same attributes: nocapture nonnull noundef dereferenceable(1).

The nonnull/noundef/dereferenceable annotations are sometimes missing due to missing annotateNonNullNoUndefBasedOnAccess() calls, e.g. on atol. For many of these (where an unconditional access occurs) we could move the noundef/dereferenceable inferral into inferNonMandatoryLibFuncAttrs(). However, nonnull depends on the null_pointer_is_valid attribute in the caller. I assume that's the reason why we currently handle even the "unconditional" cases in SLC, because they are actually conditional on the attributes of the caller.

In the strlen case nocapture might be implied by argmemonly but that's both subtle and (as far as I can see) not documented in the manual.

nocapture is not implied by argmemonly. The strlen in your IR has an explicit nocapture attribute here:

declare i64 @strlen(i8* nocapture) local_unnamed_addr #1

I would also expect all three functions to be optimized equivalently, and in particular, f1 to the same IL as f3 (even though f3 might set errno and f1 need not).

I don't think that would be legal (from LLVM's perspective) if the strtol argument just happens to point to errno. There's probably something in the C standard that precludes that?

nikic avatar Jul 27 '22 08:07 nikic

Okay, so to start let me look at adding the missing annotateNonNullNoUndefBasedOnAccess calls and take it from there.

I understand that some of the attributes (like nocpature on the strlen call) are taken from the declaration of the function (which is modified by the middle end) but that's easily missed, especially in tests where the declarations often have no attributes. I must have even overlooked it in the snippet above. There are just too many attributes all over the place. It would be nice to make this more obvious somehow. One way would be to mention all the applicable attributes in each call. With lots of attributes I suppose that could make the output cluttered enough to defeat the purpose. I guess one way to find out is to try it and see (unless it's already been done and rule out).

The pointer arguments to strtol are both restrict-qualified which implies that the function can access the object they point to only through them (or others derived from them) but not by other means.

msebor avatar Jul 27 '22 15:07 msebor