[LifetimeSafety] Add missing origins stats for lifetime analysis
This PR adds the implementation for printing missing origin stats for lifetime analysis.
Purpose:
This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form: StmtClassName<Type> : count
Approach:
- The signature of the runLifetimeAnalysis function is changed to return the LifetimeAnalysis object which will be used to get the origin manager which can be used for finding the count of missing origins.
- The count of missing origins is kept in origin manager while the CFG is visited as part of the analysis.
Example output:
For the file llvm-project/llvm/lib/Demangle/Demangle.cpp:
*** LifetimeSafety Missing Origin Stats (expression_type : count) :
DeclRefExpr<std::string_view> : 52
StringLiteral<const char[3]> : 2
DeclRefExpr<char *> : 15
CallExpr<char *> : 4
StringLiteral<const char[2]> : 1
Total missing origins: 74
****************************************
Thank you for submitting a Pull Request (PR) to the LLVM Project!
This PR will be automatically labeled and the relevant teams will be notified.
If you wish to, you can add reviewers by using the "Reviewers" section on this page.
If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.
If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.
If you have further questions, they may be answered by the LLVM GitHub User Guide.
You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-temporal-safety
Author: None (DEBADRIBASAK)
Changes
This PR adds the implementation for printing missing origin stats for lifetime analysis.
Purpose:
This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form: StmtClassName<Type> : count
Approach:
- The signature of the runLifetimeAnalysis function is changed to return the LifetimeAnalysis object which will be used to get the origin manager which can be used for finding the count of missing origins.
- The count of missing origins is kept in origin manager while the CFG is visited as part of the analysis.
Example output:
For the file llvm-project/llvm/lib/Demangle/Demangle.cpp:
*** LifetimeSafety Missing Origin Stats (expression_type : count) :
DeclRefExpr<std::string_view> : 52
StringLiteral<const char[3]> : 2
DeclRefExpr<char *> : 15
CallExpr<char *> : 4
StringLiteral<const char[2]> : 1
Total missing origins: 74
****************************************
Full diff: https://github.com/llvm/llvm-project/pull/166568.diff
6 Files Affected:
- (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h (+11-4)
- (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h (+6)
- (modified) clang/include/clang/Sema/AnalysisBasedWarnings.h (+9)
- (modified) clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp (+9-4)
- (modified) clang/lib/Analysis/LifetimeSafety/Origins.cpp (+18)
- (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+28-2)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
index 91ffbb169f947..eb532bc8be3a7 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
@@ -23,7 +23,11 @@
#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LiveOrigins.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LoanPropagation.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
#include "clang/Analysis/AnalysisDeclContext.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
+#include <string>
namespace clang::lifetimes {
@@ -44,10 +48,6 @@ class LifetimeSafetyReporter {
Confidence Confidence) {}
};
-/// The main entry point for the analysis.
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
- LifetimeSafetyReporter *Reporter);
-
namespace internal {
/// An object to hold the factories for immutable collections, ensuring
/// that all created states share the same underlying memory management.
@@ -60,6 +60,7 @@ struct LifetimeFactory {
/// Running the lifetime safety analysis and querying its results. It
/// encapsulates the various dataflow analyses.
class LifetimeSafetyAnalysis {
+
public:
LifetimeSafetyAnalysis(AnalysisDeclContext &AC,
LifetimeSafetyReporter *Reporter);
@@ -82,6 +83,12 @@ class LifetimeSafetyAnalysis {
std::unique_ptr<LoanPropagationAnalysis> LoanPropagation;
};
} // namespace internal
+
+/// The main entry point for the analysis.
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+ LifetimeSafetyReporter *Reporter);
+
} // namespace clang::lifetimes
#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIMESAFETY_H
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
index ba138b078b379..26686a63e9204 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
@@ -16,7 +16,10 @@
#include "clang/AST/Decl.h"
#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
#include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
namespace clang::lifetimes::internal {
@@ -76,6 +79,8 @@ class OriginManager {
void dump(OriginID OID, llvm::raw_ostream &OS) const;
+ const llvm::StringMap<unsigned> getMissingOrigins() const;
+
private:
OriginID getNextOriginID() { return NextOriginID++; }
@@ -85,6 +90,7 @@ class OriginManager {
llvm::SmallVector<Origin> AllOrigins;
llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+ llvm::StringMap<unsigned> ExprTypeToMissingOriginCount;
};
} // namespace clang::lifetimes::internal
diff --git a/clang/include/clang/Sema/AnalysisBasedWarnings.h b/clang/include/clang/Sema/AnalysisBasedWarnings.h
index 4103c3f006a8f..604039ef61cb7 100644
--- a/clang/include/clang/Sema/AnalysisBasedWarnings.h
+++ b/clang/include/clang/Sema/AnalysisBasedWarnings.h
@@ -14,7 +14,10 @@
#define LLVM_CLANG_SEMA_ANALYSISBASEDWARNINGS_H
#include "clang/AST/Decl.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringMap.h"
#include <memory>
namespace clang {
@@ -95,6 +98,9 @@ class AnalysisBasedWarnings {
/// a single function.
unsigned MaxUninitAnalysisBlockVisitsPerFunction;
+ /// Map from expressions missing origin in OriginManager to their counts.
+ llvm::StringMap<unsigned> MissingOriginCount;
+
/// @}
public:
@@ -116,6 +122,9 @@ class AnalysisBasedWarnings {
Policy &getPolicyOverrides() { return PolicyOverrides; }
void PrintStats() const;
+
+ void FindMissingOrigins(AnalysisDeclContext &AC,
+ clang::lifetimes::internal::FactManager &FactMgr);
};
} // namespace sema
diff --git a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
index 00c7ed90503e7..d183ce976f946 100644
--- a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
@@ -23,9 +23,11 @@
#include "clang/Analysis/AnalysisDeclContext.h"
#include "clang/Analysis/CFG.h"
#include "llvm/ADT/FoldingSet.h"
+#include "llvm/ADT/StringMap.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/raw_ostream.h"
#include <memory>
namespace clang::lifetimes {
@@ -69,9 +71,12 @@ void LifetimeSafetyAnalysis::run() {
}
} // namespace internal
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
- LifetimeSafetyReporter *Reporter) {
- internal::LifetimeSafetyAnalysis Analysis(AC, Reporter);
- Analysis.run();
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+ LifetimeSafetyReporter *Reporter) {
+ std::unique_ptr<internal::LifetimeSafetyAnalysis> Analysis =
+ std::make_unique<internal::LifetimeSafetyAnalysis>(AC, Reporter);
+ Analysis->run();
+ return Analysis;
}
} // namespace clang::lifetimes
diff --git a/clang/lib/Analysis/LifetimeSafety/Origins.cpp b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
index ea51a75324e06..453abf48261c2 100644
--- a/clang/lib/Analysis/LifetimeSafety/Origins.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
@@ -7,6 +7,9 @@
//===----------------------------------------------------------------------===//
#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
+#include "llvm/ADT/StringMap.h"
namespace clang::lifetimes::internal {
@@ -22,6 +25,10 @@ void OriginManager::dump(OriginID OID, llvm::raw_ostream &OS) const {
OS << ")";
}
+const llvm::StringMap<unsigned> OriginManager::getMissingOrigins() const {
+ return ExprTypeToMissingOriginCount;
+}
+
Origin &OriginManager::addOrigin(OriginID ID, const clang::ValueDecl &D) {
AllOrigins.emplace_back(ID, &D);
return AllOrigins.back();
@@ -37,6 +44,17 @@ OriginID OriginManager::get(const Expr &E) {
auto It = ExprToOriginID.find(&E);
if (It != ExprToOriginID.end())
return It->second;
+
+ // if the expression has no specific origin, increment the missing origin
+ // counter.
+ std::string ExprStr(E.getStmtClassName());
+ ExprStr = ExprStr + "<" + E.getType().getAsString() + ">";
+ auto CountIt = ExprTypeToMissingOriginCount.find(ExprStr);
+ if (CountIt == ExprTypeToMissingOriginCount.end()) {
+ ExprTypeToMissingOriginCount[ExprStr] = 1;
+ } else {
+ CountIt->second++;
+ }
// If the expression itself has no specific origin, and it's a reference
// to a declaration, its origin is that of the declaration it refers to.
// For pointer types, where we don't pre-emptively create an origin for the
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 140b709dbb651..77d2013ff3a93 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -29,7 +29,9 @@
#include "clang/Analysis/Analyses/CFGReachabilityAnalysis.h"
#include "clang/Analysis/Analyses/CalledOnceCheck.h"
#include "clang/Analysis/Analyses/Consumed.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
#include "clang/Analysis/Analyses/ReachableCode.h"
#include "clang/Analysis/Analyses/ThreadSafety.h"
#include "clang/Analysis/Analyses/UninitializedValues.h"
@@ -52,7 +54,9 @@
#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Casting.h"
#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <deque>
#include <iterator>
@@ -3065,7 +3069,11 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
if (EnableLifetimeSafetyAnalysis && S.getLangOpts().CPlusPlus) {
if (AC.getCFG()) {
lifetimes::LifetimeSafetyReporterImpl LifetimeSafetyReporter(S);
- lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+ std::unique_ptr<clang::lifetimes::internal::LifetimeSafetyAnalysis>
+ Analysis =
+ lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+ if (S.CollectStats)
+ FindMissingOrigins(AC, Analysis->getFactManager());
}
}
// Check for violations of "called once" parameter properties.
@@ -3131,9 +3139,27 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
}
}
+void clang::sema::AnalysisBasedWarnings::FindMissingOrigins(
+ AnalysisDeclContext &AC, lifetimes::internal::FactManager &FactMgr) {
+ if (AC.getCFG()) {
+ for (const auto &[expr, count] :
+ FactMgr.getOriginMgr().getMissingOrigins()) {
+ MissingOriginCount[expr] += count;
+ }
+ }
+}
+
void clang::sema::AnalysisBasedWarnings::PrintStats() const {
+ llvm::errs() << "\n*** LifetimeSafety Missing Origin Stats "
+ "(expression_type : count) :\n\n";
+ unsigned totalMissingOrigins = 0;
+ for (const auto &[expr, count] : MissingOriginCount) {
+ llvm::errs() << expr << " : " << count << '\n';
+ totalMissingOrigins += count;
+ }
+ llvm::errs() << "Total missing origins: " << totalMissingOrigins << "\n";
+ llvm::errs() << "\n****************************************\n";
llvm::errs() << "\n*** Analysis Based Warnings Stats:\n";
-
unsigned NumCFGsBuilt = NumFunctionsAnalyzed - NumFunctionsWithBadCFGs;
unsigned AvgCFGBlocksPerFunction =
!NumCFGsBuilt ? 0 : NumCFGBlocks/NumCFGsBuilt;
So print one statistics that maps from statement class (ids, no need for strings there) to counts and print another one that maps from types (again no need for string maps) to counts.
Xazax-hun: We had added string maps because it is easier to interpret the types/class names as strings. Will not changing it to ids make it difficult to interpret the report?
Will not changing it to ids make it difficult to interpret the report?
Absolutely. We would still print strings, but those strings can be reconstructed from the types or the statement classes when we do the printing. No need to do it eagerly.
On the other hand, if we expect these to be more independent, i.e., some types are not covered and some kinds of ast nodes are not covered and there is no correlation between them, I think it might be better to collect across these two independent axises.
I agree that these are, in principle, two different parts of the problem. I am supportive of having 2 separate metrics one indexed by StmtClass and other by QualType.
@Xazax-hun @usx95 I do not think it is feasible to add the StmtClassName and QualType as keys without converting them to strings.
- I tried using a DenseMap with
QualTypeas the keys, but that is not allowed since QualType does not have tombstone. - I was planning to use the
StmtClassas an index in an integer array of sizeStmt::lastStmtConstant+1, and useStmtClassInfoarray ofStmtClassNameTableobjects for getting their names while printing. ButStmtClassInfois a static array insideStmt.cppfile and we cannot use it in other files.
I used two string maps for now:
*** LifetimeSafety Missing Origin per QualType: (QualType : count) :
basic_string<char> : 3
basic_string_view<char> : 1
class std::basic_string<char> : 1
char * : 12
const_iterator : 1
const value_type : 3
std::string_view : 27
*** LifetimeSafety Missing Origin per StmtClassName: (StmtClassName : count) :
CXXOperatorCallExpr : 6
BinaryOperator : 3
CXXMemberCallExpr : 2
UnaryOperator : 1
DeclRefExpr : 36
Total missing origins: 48
I tried using a DenseMap with QualType as the keys, but that is not allowed since QualType does not have tombstone.
I am not sure qualifiers are actually important in the stats. We could just use the const Type* as the key.
I was planning to use the StmtClass as an index in an integer array of size Stmt::lastStmtConstant+1, and use StmtClassInfo array of StmtClassNameTable objects for getting their names while printing. But StmtClassInfo is a static array inside Stmt.cpp file and we cannot use it in other files.
I think it would make sense to have a static function on Stmt that converts and StmtClass to its name. But this change is out of scope of this PR and could be done as a follow up later on. So I am fine not doing it here.
I am not sure qualifiers are actually important in the stats. We could just use the const Type* as the key.
Xazax-hun: Even if we do that, it will store a dangling pointer in the missing origin stats map.
Even if we do that, it will store a dangling pointer in the missing origin stats map.
Are you sure? I expect all the types to still be available when we print these stats. These are stored in the ASTContext and that one is destroyed very late.
I expect all the types to still be available when we print these stats. These are stored in the ASTContext and that one is destroyed very late.
If we use Type* as the key, constructing the string is not feasible as it is done by TypePrinter object which is part of anonymous namespace in TypePrinter file. Also, if we want to just store the pointer to the QualType returned by getType function of expression pointer it will store the dangling reference to a temporary object.
You could create QualType from this Type* and then print it Type* T; ..; QualType(T, 0).getAsString();
Overall looks good to me but there are some conflicts, please resolve those.
Resolved conflicts.
After the multi-origin PR (https://github.com/llvm/llvm-project/pull/168344) the origins are tracked as lists and for tracking missing origin, only the ExprToList map is checked.
The new output for Demangle.cpp is:
*** LifetimeSafety Missing Origin per QualType: (QualType : count) :
value_type : 1
char * : 3
*** LifetimeSafety Missing Origin per StmtClassName: (StmtClassName : count) :
BinaryOperator : 3
UnaryOperator : 1
Total missing origins: 4
LGTM. Thanks.
Please update the PR description with the new output
Updated the description.
:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.
@DEBADRIBASAK Congratulations on having your first Pull Request (PR) merged into the LLVM Project!
Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.
Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.
How to do this, and the rest of the post-merge process, is covered in detail here.
If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.
If you don't get any reports, no action is required from you. Your changes are working as expected, well done!