tally icon indicating copy to clipboard operation
tally copied to clipboard

32bit systems panic on unaligned 64-bit atomic operation

Open gibmat opened this issue 2 years ago • 3 comments

The reporter struct defined in m3/reporter.go includes fields with atomic types that are not properly aligned on 32bit systems. This causes TestIntegrationProcessFlushOnExit to eventually timeout, since the test "recovers" from the panic:

panic: unaligned 64-bit atomic operation
	panic: unaligned 64-bit atomic operation [recovered]
	panic: unaligned 64-bit atomic operation

Rearranging the order so those fields come first to guarantee proper alignment fixes the issue:

diff --git a/m3/reporter.go b/m3/reporter.go
index e25e436..141f634 100644
--- a/m3/reporter.go
+++ b/m3/reporter.go
@@ -105,6 +105,12 @@ type Reporter interface {
 // remote M3 collector, metrics are batched together and emitted
 // via either thrift compact or binary protocol in batch UDP packets.
 type reporter struct {
+	now             atomic.Int64
+	pending         atomic.Uint64
+	numBatches      atomic.Int64
+	numMetrics      atomic.Int64
+	numWriteErrors  atomic.Int64
+	done            atomic.Bool
 	bucketIDTagName string
 	bucketTagName   string
 	bucketValFmt    string
@@ -114,24 +120,18 @@ type reporter struct {
 	calcProto       thrift.TProtocol
 	client          *m3thrift.M3Client
 	commonTags      []m3thrift.MetricTag
-	done            atomic.Bool
 	donech          chan struct{}
 	freeBytes       int32
 	metCh           chan sizedMetric
-	now             atomic.Int64
 	overheadBytes   int32
-	pending         atomic.Uint64
 	resourcePool    *resourcePool
 	stringInterner  *cache.StringInterner
 	tagCache        *cache.TagCache
 	wg              sync.WaitGroup
 
 	batchSizeHistogram    tally.CachedHistogram
-	numBatches            atomic.Int64
 	numBatchesCounter     tally.CachedCount
-	numMetrics            atomic.Int64
 	numMetricsCounter     tally.CachedCount
-	numWriteErrors        atomic.Int64
 	numWriteErrorsCounter tally.CachedCount
 	numTagCacheCounter    tally.CachedCount
 }

gibmat avatar Jul 08 '23 00:07 gibmat

@gibmat yeah... https://github.com/golang/go/issues/36606 is the upstream bug in the go compiler for this case.

fwiw - this library is primarily in use at Uber, where all our servers are 64bit. i don't know if we're ever going to have users on non-64bit platforms, this is why we don't have any CI/validation. i'm not sure what other issues you're going to run into by going down this road.

if you don't mind, may I ask what project/platform you're using this library for?

prateek avatar Jul 08 '23 01:07 prateek

I'm helping with the packaging of this library for Debian (https://tracker.debian.org/pkg/golang-github-uber-go-tally), as it's a dependency of other libraries that I want to package. Debian supports a handful of 32bit architectures, and I saw this issue when the tests failed on those systems.

gibmat avatar Jul 08 '23 13:07 gibmat

Another way to look at this is the fact that non-optimal struct member alignment will waste memory, even on 64-bit platforms. Due to the fact that the reporter struct contains two int32 members which are not adjacent, this struct wastes 8 bytes on 64-bit hosts due to unnecessary padding.

(Note that the proposed patch still wastes memory, because even though the atomic members are aligned / packed, the int32 members are still not optimally packed)

dswarbrick avatar Aug 18 '25 12:08 dswarbrick