llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

WIP: Extend SourceLocation to 64 bits.

Open hokein opened this issue 6 months ago • 2 comments

See discussion: https://discourse.llvm.org/t/revisiting-64-bit-source-locations/86556

This patch extends the SourceLocation structure from 4 bytes to 8 bytes. For now, only the lower 40 bits are used, leaving 24 bits reserved for future use. In theory, we could use up to 48 bits limited by the current encoding scheme, where a SourceLocation is represented as a 64-bit pair <ModuleFileIndex, SourceLocationOffset>, with ModuleFileIndex taking 16 bits. Starting with 40 bits should be sufficient, and the limit can be easily adjusted later if needed.

AST-related changes

  • moved SourceLocation out of some AST nodes’ StmtBitFields, as the bitfields no longer have enough space.
  • moved the source range of CXXOperatorNameExpr to the heap to avoid increasing the size of DeclarationNameLoc. This minimizes the growth of frequently used nodes like DeclRefExpr, which now increases from 32 to 40 bytes.
  • moved NumArgs into CallExpr’s bitfields (20 bits, which should be sufficient).

Performance

  • with binary search optimization for getFileIDLocal , compile-time overhead is offset https://llvm-compile-time-tracker.com/compare.php?from=0b6ddb02efdcbdac9426e8d857499ea0580303cd&to=7f125a0c5f0fbcfbb920d223628dc1c48607d400&stat=instructions%3Au
  • peak memory usage (max RSS) increased by up to 3.3%: https://llvm-compile-time-tracker.com/compare.php?from=0b6ddb02efdcbdac9426e8d857499ea0580303cd&to=7f125a0c5f0fbcfbb920d223628dc1c48607d400&stat=instructions%3Au
  • the template instantiation depth increase should be offset by optimizations in #142840, #142983

libclang

To avoid ABI breakage in libclang, we continue using 32-bit source locations. These are converted from the 64-bit locations with bounds checking. If the value doesn’t fit, an invalid source location is returned.

TODO

  • update documentation and inline comments related to SourceLocation.
  • split out the binary search optimization in getFileIDLocal into a separate patch.
  • report detailed size impact of AST node changes.
  • revisit the getExprLoc optimization for CallExpr (#141058). Currently, allocation size is increased from 32 to 40 bytes without further tuning.

hokein avatar Jun 30 '25 07:06 hokein

:warning: C/C++ code formatter, clang-format found issues in your code. :warning:

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp clang-tools-extra/clangd/unittests/SourceCodeTests.cpp clang-tools-extra/include-cleaner/lib/HTMLReport.cpp clang/include/clang/AST/ASTContext.h clang/include/clang/AST/DeclBase.h clang/include/clang/AST/DeclObjC.h clang/include/clang/AST/DeclarationName.h clang/include/clang/AST/Expr.h clang/include/clang/AST/ExprCXX.h clang/include/clang/AST/ExprConcepts.h clang/include/clang/AST/ExternalASTSource.h clang/include/clang/AST/Stmt.h clang/include/clang/Basic/Diagnostic.h clang/include/clang/Basic/SourceLocation.h clang/include/clang/Basic/SourceManager.h clang/include/clang/Rewrite/Core/Rewriter.h clang/include/clang/Sema/MultiplexExternalSemaSource.h clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/SourceLocationEncoding.h clang/lib/AST/ASTContext.cpp clang/lib/AST/ASTImporter.cpp clang/lib/AST/DeclarationName.cpp clang/lib/AST/Expr.cpp clang/lib/AST/ExprCXX.cpp clang/lib/AST/ExprConcepts.cpp clang/lib/AST/ExternalASTSource.cpp clang/lib/AST/Stmt.cpp clang/lib/Basic/Diagnostic.cpp clang/lib/Basic/SourceLocation.cpp clang/lib/Basic/SourceManager.cpp clang/lib/Format/FormatTokenLexer.cpp clang/lib/Parse/ParseStmtAsm.cpp clang/lib/Rewrite/Rewriter.cpp clang/lib/Sema/MultiplexExternalSemaSource.cpp clang/lib/Sema/SemaDecl.cpp clang/lib/Sema/SemaLambda.cpp clang/lib/Sema/SemaOverload.cpp clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTReaderStmt.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Serialization/ASTWriterDecl.cpp clang/lib/Serialization/ASTWriterStmt.cpp clang/tools/libclang/CIndex.cpp clang/tools/libclang/CXIndexDataConsumer.cpp clang/tools/libclang/CXSourceLocation.cpp clang/tools/libclang/CXSourceLocation.h clang/tools/libclang/Indexing.cpp clang/unittests/Lex/PPMemoryAllocationsTest.cpp clang/unittests/Serialization/SourceLocationEncodingTest.cpp
View the diff from clang-format here.
diff --git a/clang/include/clang/AST/DeclObjC.h b/clang/include/clang/AST/DeclObjC.h
index 56fc4ba41..804215cb8 100644
--- a/clang/include/clang/AST/DeclObjC.h
+++ b/clang/include/clang/AST/DeclObjC.h
@@ -1093,9 +1093,7 @@ public:
 
   SourceLocation getAtStartLoc() const { return AtStart; }
 
-  void setAtStartLoc(SourceLocation Loc) {
-    AtStart = Loc;
-  }
+  void setAtStartLoc(SourceLocation Loc) { AtStart = Loc; }
 
   // Marks the end of the container.
   SourceRange getAtEndRange() const { return AtEnd; }
diff --git a/clang/include/clang/AST/DeclarationName.h b/clang/include/clang/AST/DeclarationName.h
index db7973bcc..bbb91fc14 100644
--- a/clang/include/clang/AST/DeclarationName.h
+++ b/clang/include/clang/AST/DeclarationName.h
@@ -703,7 +703,7 @@ class DeclarationNameLoc {
 
   // The location (if any) of the operator keyword is stored elsewhere.
   struct CXXOpName {
-    CXXOperatorSourceInfo* OInfo;
+    CXXOperatorSourceInfo *OInfo;
   };
 
   // The location (if any) of the operator keyword is stored elsewhere.
@@ -740,8 +740,7 @@ public:
   SourceLocation getCXXOperatorNameBeginLoc() const {
     if (!CXXOperatorName.OInfo)
       return {};
-    return 
-        CXXOperatorName.OInfo->BeginOpNameLoc;
+    return CXXOperatorName.OInfo->BeginOpNameLoc;
   }
 
   /// Return the end location of the getCXXOperatorNameRange() range.
@@ -773,7 +772,7 @@ public:
     DNL.setNamedTypeLoc(TInfo);
     return DNL;
   }
-  
+
   /// Construct location information for a non-literal C++ operator.
   static DeclarationNameLoc
   makeCXXOperatorNameLoc(CXXOperatorSourceInfo *OInfo) {
@@ -838,7 +837,7 @@ public:
       return nullptr;
     return LocInfo.getNamedTypeInfo();
   }
-  
+
   /// setNamedTypeInfo - Sets the source type info associated to
   /// the name. Assumes it is a constructor, destructor or conversion.
   void setNamedTypeInfo(TypeSourceInfo *TInfo) {
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 6d1c2df6b..cf0e80900 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1195,7 +1195,9 @@ public:
     : Expr(OpaqueValueExprClass, Empty) {}
 
   /// Retrieve the location of this expression.
-  SourceLocation getLocation() const { return SourceLocation::getFromRawEncoding(OpaqueValueExprBits.Loc); }
+  SourceLocation getLocation() const {
+    return SourceLocation::getFromRawEncoding(OpaqueValueExprBits.Loc);
+  }
 
   SourceLocation getBeginLoc() const LLVM_READONLY {
     return SourceExpr ? SourceExpr->getBeginLoc() : getLocation();
@@ -1269,9 +1271,9 @@ class DeclRefExpr final
   friend class ASTStmtReader;
   friend class ASTStmtWriter;
   friend TrailingObjects;
-  
-      /// The location of the declaration name itself.
-      SourceLocation Loc;
+
+  /// The location of the declaration name itself.
+  SourceLocation Loc;
 
   /// The declaration that we are referencing.
   ValueDecl *D;
@@ -2006,8 +2008,8 @@ class PredefinedExpr final
       private llvm::TrailingObjects<PredefinedExpr, Stmt *> {
   friend class ASTStmtReader;
   friend TrailingObjects;
-   /// The location of this PredefinedExpr.
-   SourceLocation Loc;
+  /// The location of this PredefinedExpr.
+  SourceLocation Loc;
   // PredefinedExpr is optionally followed by a single trailing
   // "Stmt *" for the predefined identifier. It is present if and only if
   // hasFunctionName() is true and is always a "StringLiteral *".
@@ -3121,7 +3123,9 @@ public:
   /// Bluntly set a new number of arguments without doing any checks whatsoever.
   /// Only used during construction of a CallExpr in a few places in Sema.
   /// FIXME: Find a way to remove it.
-  void setNumArgsUnsafe(unsigned NewNumArgs) { CallExprBits.NumArgs = NewNumArgs; }
+  void setNumArgsUnsafe(unsigned NewNumArgs) {
+    CallExprBits.NumArgs = NewNumArgs;
+  }
 
   typedef ExprIterator arg_iterator;
   typedef ConstExprIterator const_arg_iterator;
@@ -3306,7 +3310,7 @@ class MemberExpr final
 
   /// MemberLoc - This is the location of the member name.
   SourceLocation MemberLoc;
-  
+
   SourceLocation OperatorLoc;
 
   size_t numTrailingObjects(OverloadToken<NestedNameSpecifierLoc>) const {
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index 98f40b8a8..18fc4dcef 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -3929,7 +3929,8 @@ public:
 
   /// Retrieve the location of the '->' or '.' operator.
   SourceLocation getOperatorLoc() const {
-    return SourceLocation::getFromRawEncoding(CXXDependentScopeMemberExprBits.OperatorLoc);
+    return SourceLocation::getFromRawEncoding(
+        CXXDependentScopeMemberExprBits.OperatorLoc);
   }
 
   /// Retrieve the nested-name-specifier that qualifies the member name.
@@ -4656,7 +4657,8 @@ public:
   }
 
   SourceLocation getNameLoc() const {
-    return SourceLocation::getFromRawEncoding(SubstNonTypeTemplateParmExprBits.NameLoc);
+    return SourceLocation::getFromRawEncoding(
+        SubstNonTypeTemplateParmExprBits.NameLoc);
   }
   SourceLocation getBeginLoc() const { return getNameLoc(); }
   SourceLocation getEndLoc() const { return getNameLoc(); }
diff --git a/clang/include/clang/AST/Stmt.h b/clang/include/clang/AST/Stmt.h
index 9a363f25f..1a04e2f5f 100644
--- a/clang/include/clang/AST/Stmt.h
+++ b/clang/include/clang/AST/Stmt.h
@@ -154,7 +154,7 @@ protected:
     /// floating-point features.
     LLVM_PREFERRED_TYPE(bool)
     unsigned HasFPFeatures : 1;
-  
+
     unsigned NumStmts;
   };
 
@@ -455,8 +455,6 @@ protected:
     unsigned NonOdrUseReason : 2;
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsImmediateEscalating : 1;
-
-
   };
 
 
@@ -526,8 +524,6 @@ protected:
     /// It is 0 otherwise.
     LLVM_PREFERRED_TYPE(bool)
     unsigned HasFPFeatures : 1;
-
-
   };
 
   class UnaryExprOrTypeTraitExprBitfields {
@@ -582,8 +578,8 @@ protected:
     /// Trailing objects. See the definition of CallExpr.
     LLVM_PREFERRED_TYPE(bool)
     unsigned HasTrailingSourceLoc : 1;
-    
-    unsigned NumArgs:20;
+
+    unsigned NumArgs : 20;
   };
   enum { NumCallExprBits = 52 };
 
@@ -629,7 +625,7 @@ protected:
     /// This is the location of the -> or . in the expression.
     // SourceLocation OperatorLoc;
   };
-  
+
   // 8 bytes
   class CastExprBitfields {
     friend class CastExpr;
@@ -1047,8 +1043,6 @@ protected:
     unsigned ConstructionKind : 3;
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsImmediateEscalating : 1;
-
-
   };
 
   class ExprWithCleanupsBitfields {
@@ -1097,7 +1091,7 @@ protected:
     /// the trailing objects.
     LLVM_PREFERRED_TYPE(bool)
     uint64_t HasFirstQualifierFoundInScope : 1;
-    
+
     /// The location of the '->' or '.' operator.
     LLVM_PREFERRED_TYPE(SourceLocation)
     uint64_t OperatorLoc : SourceLocation::Bits;
@@ -1175,7 +1169,7 @@ protected:
   class SubstNonTypeTemplateParmExprBitfields {
     friend class ASTStmtReader;
     friend class SubstNonTypeTemplateParmExpr;
-    
+
     LLVM_PREFERRED_TYPE(ExprBitfields)
     uint64_t : NumExprBits;
 
@@ -1313,8 +1307,8 @@ protected:
     /// bit is set to true.
     LLVM_PREFERRED_TYPE(bool)
     uint64_t IsUnique : 1;
-    
-     /// The location of the non-type template parameter reference.
+
+    /// The location of the non-type template parameter reference.
     LLVM_PREFERRED_TYPE(SourceLocation)
     uint64_t Loc : SourceLocation::Bits;
   };
@@ -1978,8 +1972,8 @@ class CaseStmt final
   //   with a range. Present if and only if caseStmtIsGNURange() is true.
   enum { LhsOffset = 0, SubStmtOffsetFromRhs = 1 };
   enum { NumMandatoryStmtPtr = 2 };
-      /// The location of the "case" or "default" keyword.
-      SourceLocation KeywordLoc;
+  /// The location of the "case" or "default" keyword.
+  SourceLocation KeywordLoc;
   unsigned numTrailingObjects(OverloadToken<Stmt *>) const {
     return NumMandatoryStmtPtr + caseStmtIsGNURange();
   }
@@ -2241,8 +2235,8 @@ class AttributedStmt final
       private llvm::TrailingObjects<AttributedStmt, const Attr *> {
   friend class ASTStmtReader;
   friend TrailingObjects;
-    /// The location of the attribute.
-    SourceLocation AttrLoc;
+  /// The location of the attribute.
+  SourceLocation AttrLoc;
   Stmt *SubStmt;
 
   AttributedStmt(SourceLocation Loc, ArrayRef<const Attr *> Attrs,
@@ -3116,7 +3110,6 @@ public:
 /// ContinueStmt - This represents a continue.
 class ContinueStmt : public Stmt {
 public:
-
   ContinueStmt(SourceLocation CL) : Stmt(ContinueStmtClass) {
     setContinueLoc(CL);
   }
diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index 91352a4aa..5f0a5db8b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -165,10 +165,12 @@ public:
 
   bool getRawEncoding32(uint32_t &Result) const {
     // A mask that isolates this check to the required range higher of bits.
-    static constexpr uint64_t RangeMask = llvm::maskTrailingOnes<uint64_t>(Bits - 32) << 31;
+    static constexpr uint64_t RangeMask =
+        llvm::maskTrailingOnes<uint64_t>(Bits - 32) << 31;
 
     // Check if the ID can be safely compressed to a 32-bit integer.
-    // The truncation is only possible if all higher bits of the ID are all identical:
+    // The truncation is only possible if all higher bits of the ID are all
+    // identical:
     //   all 0s for the local offset, or all 1s for loaded offset
     if ((ID ^ (ID << 1)) & RangeMask)
       return false; // won't fit
diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index 930e48d6c..1f9727411 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -721,7 +721,7 @@ class SourceManager : public RefCountedBase<SourceManager> {
   SmallVector<SrcMgr::SLocEntry, 0> LocalSLocEntryTable;
 
   SmallVector<SourceLocation::UIntTy, 0> LocalLocOffsetTable;
-  
+
   /// The table of SLocEntries that are loaded from other modules.
   ///
   /// Negative FileIDs are indexes into this table. To get from ID to an index,
@@ -1290,7 +1290,7 @@ public:
     if (!E)
       return std::make_pair(FileID(), 0);
 
-    auto Offset = Loc.getOffset()-E->getOffset();
+    auto Offset = Loc.getOffset() - E->getOffset();
     if (Loc.isFileID())
       return std::make_pair(FID, Offset);
 
@@ -1307,7 +1307,7 @@ public:
     if (!E)
       return std::make_pair(FileID(), 0);
 
-    auto Offset = Loc.getOffset()-E->getOffset();
+    auto Offset = Loc.getOffset() - E->getOffset();
     if (Loc.isFileID())
       return std::make_pair(FID, Offset);
     return getDecomposedSpellingLocSlowCase(E, Offset);
@@ -1981,8 +1981,9 @@ private:
 
   FileIDAndOffset
   getDecomposedExpansionLocSlowCase(const SrcMgr::SLocEntry *E) const;
-  FileIDAndOffset getDecomposedSpellingLocSlowCase(const SrcMgr::SLocEntry *E,
-                                                   SourceLocation::UIntTy Offset) const;
+  FileIDAndOffset
+  getDecomposedSpellingLocSlowCase(const SrcMgr::SLocEntry *E,
+                                   SourceLocation::UIntTy Offset) const;
   void computeMacroArgsCache(MacroArgsMap &MacroArgsCache, FileID FID) const;
   void associateFileChunkWithMacroArgExp(MacroArgsMap &MacroArgsCache,
                                          FileID FID,
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 689def9c6..df2279a4f 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -7104,7 +7104,8 @@ ASTContext::getNameForTemplate(TemplateName Name,
     } else {
       DName = DeclarationNames.getCXXOperatorName(TN.getOperator());
       // DNInfo work in progress: FIXME: source locations?
-      DeclarationNameLoc DNLoc = DeclarationNameLoc::makeCXXOperatorNameLoc(nullptr);
+      DeclarationNameLoc DNLoc =
+          DeclarationNameLoc::makeCXXOperatorNameLoc(nullptr);
       return DeclarationNameInfo(DName, NameLoc, DNLoc);
     }
   }
diff --git a/clang/lib/Basic/Diagnostic.cpp b/clang/lib/Basic/Diagnostic.cpp
index 9e176c1bc..b2cd532d1 100644
--- a/clang/lib/Basic/Diagnostic.cpp
+++ b/clang/lib/Basic/Diagnostic.cpp
@@ -204,8 +204,8 @@ DiagnosticsEngine::DiagStateMap::lookup(SourceManager &SrcMgr,
   return F->lookup(Decomp.second);
 }
 
-DiagnosticsEngine::DiagState *
-DiagnosticsEngine::DiagStateMap::File::lookup(SourceLocation::UIntTy Offset) const {
+DiagnosticsEngine::DiagState *DiagnosticsEngine::DiagStateMap::File::lookup(
+    SourceLocation::UIntTy Offset) const {
   auto OnePastIt =
       llvm::partition_point(StateTransitions, [=](const DiagStatePoint &P) {
         return P.Offset <= Offset;
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index b7cbcd54b..37036302c 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -943,9 +943,8 @@ FileIDAndOffset SourceManager::getDecomposedExpansionLocSlowCase(
   return std::make_pair(FID, Offset);
 }
 
-FileIDAndOffset
-SourceManager::getDecomposedSpellingLocSlowCase(const SrcMgr::SLocEntry *E,
-                                                SourceLocation::UIntTy Offset) const {
+FileIDAndOffset SourceManager::getDecomposedSpellingLocSlowCase(
+    const SrcMgr::SLocEntry *E, SourceLocation::UIntTy Offset) const {
   // If this is an expansion record, walk through all the expansion points.
   FileID FID;
   SourceLocation Loc;
diff --git a/clang/lib/Rewrite/Rewriter.cpp b/clang/lib/Rewrite/Rewriter.cpp
index e4dca072b..28d9092e1 100644
--- a/clang/lib/Rewrite/Rewriter.cpp
+++ b/clang/lib/Rewrite/Rewriter.cpp
@@ -131,7 +131,7 @@ std::string Rewriter::getRewrittenText(CharSourceRange Range) const {
 }
 
 SourceLocation::UIntTy Rewriter::getLocationOffsetAndFileID(SourceLocation Loc,
-                                              FileID &FID) const {
+                                                            FileID &FID) const {
   assert(Loc.isValid() && "Invalid location");
   FileIDAndOffset V = SourceMgr->getDecomposedLoc(Loc);
   FID = V.first;
diff --git a/clang/lib/Sema/SemaLambda.cpp b/clang/lib/Sema/SemaLambda.cpp
index e172a099f..a47db06c3 100644
--- a/clang/lib/Sema/SemaLambda.cpp
+++ b/clang/lib/Sema/SemaLambda.cpp
@@ -1003,9 +1003,8 @@ CXXMethodDecl *Sema::CreateLambdaCallOperator(SourceRange IntroducerRange,
   // and trailing-return-type respectively.
   DeclarationName MethodName =
       Context.DeclarationNames.getCXXOperatorName(OO_Call);
-  DeclarationNameLoc MethodNameLoc =
-      DeclarationNameLoc::makeCXXOperatorNameLoc(
-        Context.getCXXOperatorSourceInfo(IntroducerRange.getBegin()));
+  DeclarationNameLoc MethodNameLoc = DeclarationNameLoc::makeCXXOperatorNameLoc(
+      Context.getCXXOperatorSourceInfo(IntroducerRange.getBegin()));
   CXXMethodDecl *Method = CXXMethodDecl::Create(
       Context, Class, SourceLocation(),
       DeclarationNameInfo(MethodName, IntroducerRange.getBegin(),
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index f75e7df1a..69e219d53 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -9793,7 +9793,7 @@ ASTRecordReader::readDeclarationNameLoc(DeclarationName Name) {
 
   case DeclarationName::CXXOperatorName:
     return DeclarationNameLoc::makeCXXOperatorNameLoc(
-      getASTContext().getCXXOperatorSourceInfo(readSourceRange()));
+        getASTContext().getCXXOperatorSourceInfo(readSourceRange()));
 
   case DeclarationName::CXXLiteralOperatorName:
     return DeclarationNameLoc::makeCXXLiteralOperatorNameLoc(
diff --git a/clang/lib/Serialization/ASTReaderStmt.cpp b/clang/lib/Serialization/ASTReaderStmt.cpp
index 51115ac37..b8a12eb7a 100644
--- a/clang/lib/Serialization/ASTReaderStmt.cpp
+++ b/clang/lib/Serialization/ASTReaderStmt.cpp
@@ -2043,7 +2043,8 @@ void ASTStmtReader::VisitCXXDependentScopeMemberExpr(
   else
     E->Base = nullptr;
 
-  E->CXXDependentScopeMemberExprBits.OperatorLoc = readSourceLocation().getRawEncoding();
+  E->CXXDependentScopeMemberExprBits.OperatorLoc =
+      readSourceLocation().getRawEncoding();
 
   if (HasFirstQualifierFoundInScope)
     *E->getTrailingObjects<NamedDecl *>() = readDeclAs<NamedDecl>();
@@ -2230,7 +2231,8 @@ void ASTStmtReader::VisitSubstNonTypeTemplateParmExpr(
   E->Index = CurrentUnpackingBits->getNextBits(/*Width=*/12);
   E->PackIndex = Record.readUnsignedOrNone().toInternalRepresentation();
   E->Final = CurrentUnpackingBits->getNextBit();
-  E->SubstNonTypeTemplateParmExprBits.NameLoc = readSourceLocation().getRawEncoding();
+  E->SubstNonTypeTemplateParmExprBits.NameLoc =
+      readSourceLocation().getRawEncoding();
   E->Replacement = Record.readSubExpr();
 }
 
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 0554e155e..e1a5d2690 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -175,10 +175,9 @@ CXSourceRange cxloc::translateSourceRange(const SourceManager &SM,
 }
 
 CharSourceRange cxloc::translateCXRangeToCharRange(CXSourceRange R) {
-   if (!R.ptr_data[0])
-     return CharSourceRange();
-   const SourceManager &SM =
-      *static_cast<const SourceManager *>(R.ptr_data[0]);
+  if (!R.ptr_data[0])
+    return CharSourceRange();
+  const SourceManager &SM = *static_cast<const SourceManager *>(R.ptr_data[0]);
   return CharSourceRange::getCharRange(
       SourceLocation::getFromRawEncoding32(SM, R.begin_int_data),
       SourceLocation::getFromRawEncoding32(SM, R.end_int_data));
@@ -7643,9 +7642,9 @@ CXString clang_getTokenSpelling(CXTranslationUnit TU, CXToken CXTok) {
   if (!CXXUnit)
     return cxstring::createEmpty();
 
-  SourceLocation Loc = SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(), CXTok.int_data[1]);
-  auto LocInfo =
-      CXXUnit->getSourceManager().getDecomposedSpellingLoc(Loc);
+  SourceLocation Loc = SourceLocation::getFromRawEncoding32(
+      CXXUnit->getSourceManager(), CXTok.int_data[1]);
+  auto LocInfo = CXXUnit->getSourceManager().getDecomposedSpellingLoc(Loc);
   bool Invalid = false;
   StringRef Buffer =
       CXXUnit->getSourceManager().getBufferData(LocInfo.first, &Invalid);
@@ -7667,7 +7666,8 @@ CXSourceLocation clang_getTokenLocation(CXTranslationUnit TU, CXToken CXTok) {
 
   return cxloc::translateSourceLocation(
       CXXUnit->getASTContext(),
-      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(), CXTok.int_data[1]));
+      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(),
+                                           CXTok.int_data[1]));
 }
 
 CXSourceRange clang_getTokenExtent(CXTranslationUnit TU, CXToken CXTok) {
@@ -7682,7 +7682,8 @@ CXSourceRange clang_getTokenExtent(CXTranslationUnit TU, CXToken CXTok) {
 
   return cxloc::translateSourceRange(
       CXXUnit->getASTContext(),
-      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(), CXTok.int_data[1]));
+      SourceLocation::getFromRawEncoding32(CXXUnit->getSourceManager(),
+                                           CXTok.int_data[1]));
 }
 
 static bool getTokens(ASTUnit *CXXUnit, SourceRange Range,
@@ -7727,7 +7728,7 @@ static bool getTokens(ASTUnit *CXXUnit, SourceRange Range,
     if (!Tok.getLocation().getRawEncoding32(TokLocRaw))
       return false; // location is too big for libclang ABI
     CXTok.int_data[1] = TokLocRaw;
-    
+
     CXTok.int_data[2] = Tok.getLength();
     CXTok.int_data[3] = 0;
 
diff --git a/clang/tools/libclang/CXSourceLocation.cpp b/clang/tools/libclang/CXSourceLocation.cpp
index 6ea42f795..835efc914 100644
--- a/clang/tools/libclang/CXSourceLocation.cpp
+++ b/clang/tools/libclang/CXSourceLocation.cpp
@@ -216,7 +216,7 @@ int clang_Location_isInSystemHeader(CXSourceLocation location) {
   const SourceManager &SM =
       *static_cast<const SourceManager *>(location.ptr_data[0]);
   const SourceLocation Loc =
-    SourceLocation::getFromRawEncoding32(SM, location.int_data);
+      SourceLocation::getFromRawEncoding32(SM, location.int_data);
   if (Loc.isInvalid())
     return 0;
 
diff --git a/clang/tools/libclang/CXSourceLocation.h b/clang/tools/libclang/CXSourceLocation.h
index bc36db357..d514b0448 100644
--- a/clang/tools/libclang/CXSourceLocation.h
+++ b/clang/tools/libclang/CXSourceLocation.h
@@ -69,11 +69,10 @@ static inline CXSourceRange translateSourceRange(ASTContext &Context,
 }
 
 static inline SourceLocation translateSourceLocation(CXSourceLocation L) {
-   if (!L.ptr_data[0]) {
+  if (!L.ptr_data[0]) {
     return SourceLocation();
   }
-  const SourceManager &SM =
-      *static_cast<const SourceManager *>(L.ptr_data[0]);
+  const SourceManager &SM = *static_cast<const SourceManager *>(L.ptr_data[0]);
   return SourceLocation::getFromRawEncoding32(SM, L.int_data);
 }
 
diff --git a/clang/tools/libclang/Indexing.cpp b/clang/tools/libclang/Indexing.cpp
index da3f3f973..b62204664 100644
--- a/clang/tools/libclang/Indexing.cpp
+++ b/clang/tools/libclang/Indexing.cpp
@@ -995,7 +995,7 @@ void clang_indexLoc_getFileLocation(CXIdxLoc location,
 
 CXSourceLocation clang_indexLoc_getCXSourceLocation(CXIdxLoc location) {
   if (!location.ptr_data[0])
-  return clang_getNullLocation();
+    return clang_getNullLocation();
 
   CXIndexDataConsumer &DataConsumer =
       *static_cast<CXIndexDataConsumer *>(location.ptr_data[0]);
diff --git a/clang/unittests/Serialization/SourceLocationEncodingTest.cpp b/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
index 63aec9a94..0c36feef6 100644
--- a/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
+++ b/clang/unittests/Serialization/SourceLocationEncodingTest.cpp
@@ -32,10 +32,9 @@ void roundTrip(SourceLocation::UIntTy Loc,
   SourceLocation::UIntTy DecodedEncoded =
       SourceLocationEncoding::decode(ActualEncoded).first.getRawEncoding();
   ASSERT_EQ(DecodedEncoded, Loc) << "Decoding " << ActualEncoded;
-} 
+}
 
-constexpr SourceLocation::UIntTy MacroBit =
-    1ull << (SourceLocation::Bits - 1);
+constexpr SourceLocation::UIntTy MacroBit = 1ull << (SourceLocation::Bits - 1);
 constexpr SourceLocation::UIntTy Big = 1ull << (SourceLocation::Bits - 2);
 constexpr SourceLocation::UIntTy Biggest =
     llvm::maskTrailingOnes<uint64_t>(SourceLocation::Bits - 1);

github-actions[bot] avatar Jun 30 '25 07:06 github-actions[bot]

Although this is still a draft, it should be ready for an initial round of review. Any feedback or suggestions would be greatly appreciated.

@AaronBallman @cor3ntin @erichkeane @ilya-biryukov

hokein avatar Jun 30 '25 07:06 hokein

I did a light pass of the 1st 1/4 of this or so. The smuggling back and forth to raw-encoding seems strange? WHy are we doing that here?

In this patch, to keep AST node size as small as possible, we're using 40 bits of StmtBits to store the SourceLocation, we need this back-and-forth conversion.

For the DeclarationName, you're right — we can just store the SourceLocation directly and avoid the raw-encoding round-trip.

hokein avatar Jun 30 '25 15:06 hokein

I did a light pass of the 1st 1/4 of this or so. The smuggling back and forth to raw-encoding seems strange? WHy are we doing that here?

In this patch, to keep AST node size as small as possible, we're using 40 bits of StmtBits to store the SourceLocation, we need this back-and-forth conversion.

For the DeclarationName, you're right — we can just store the SourceLocation directly and avoid the raw-encoding round-trip.

Sorry for being dumb... how are we getting away with only storing 40 bits? Does that not just artificially limit our source-location size for statements? And, frankly, then our entire space? I guess 8 2x-increases in source-location size are nice, but I was sort of hoping the switch to 64 bits meant we never had to discuss this again for the rest of my life:D

erichkeane avatar Jun 30 '25 15:06 erichkeane

Sorry for being dumb... how are we getting away with only storing 40 bits? Does that not just artificially limit our source-location size for statements? And, frankly, then our entire space? I guess 8 2x-increases in source-location size are nice, but I was sort of hoping the switch to 64 bits meant we never had to discuss this again for the rest of my life:D

Yeah, kind of. We can actually use up to 48 bits for SourceLocation, based on how we encode it for modules-- see the PR description for details.

The entire space will be 2^(SourceLocation::Bits-1). In the current implementation, adjusting the number of bits used is straightforward — it just changes SourceLocation::Bits. So I figured we could start with a “small” number (40 bits in this case, so that most Stmt's Loc can stay inside the StmtBits) and see if that’s sufficient in practice. If we ever need more, increasing it is relatively easy.

hokein avatar Jun 30 '25 19:06 hokein

Closing it in favor of #147292

hokein avatar Jul 08 '25 13:07 hokein