GH-40357: [C++] Add benchmark for ToTensor conversions
Rationale for this change
We should add benchmarks to be sure not to cause regressions while working on additional implementations of RecordBatch::ToTensor and Table::ToTensor.
What changes are included in this PR?
New cpp/src/arrow/to_tensor_benchmark.cc file.
- GitHub Issue: #40357
Can you show the result of running them? And we might want to use some more data to get a more reliable result?
This was the result output:
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-ahcnq1ah/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 17.32, 18.72, 16.18
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------
RecordBatchUniformTypesSimple 624 ns 624 ns 1125492 bytes_per_second=1.29039Gi/s items_per_second=43.2982M/s
WIll use RandomArrayGenerator to generate more data and add the result here.
The result from running archery benchmark diff --benchmark-filter=BatchToTensorSimple on the second commit (but with arrays of length 100, not 500):
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-jun4cokj/WORKSPACE/build/release/arrow-to-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 24.95, 25.04, 19.14
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type> 550 ns 550 ns 1254345 bytes_per_second=4.06699Gi/s items_per_second=545.863M/s
BatchToTensorSimple<UInt16Type> 555 ns 553 ns 1235570 bytes_per_second=8.08251Gi/s items_per_second=542.408M/s
BatchToTensorSimple<UInt32Type> 569 ns 568 ns 1253335 bytes_per_second=15.7341Gi/s items_per_second=527.949M/s
BatchToTensorSimple<UInt64Type> 580 ns 580 ns 1237449 bytes_per_second=30.8253Gi/s items_per_second=517.163M/s
BatchToTensorSimple<Int8Type> 548 ns 548 ns 1249732 bytes_per_second=4.07944Gi/s items_per_second=547.533M/s
BatchToTensorSimple<Int16Type> 623 ns 568 ns 1233654 bytes_per_second=7.87246Gi/s items_per_second=528.312M/s
BatchToTensorSimple<Int32Type> 565 ns 564 ns 1204923 bytes_per_second=15.8461Gi/s items_per_second=531.706M/s
BatchToTensorSimple<Int64Type> 585 ns 585 ns 1269059 bytes_per_second=30.5699Gi/s items_per_second=512.878M/s
BatchToTensorSimple<HalfFloatType> 545 ns 544 ns 1217900 bytes_per_second=8.21219Gi/s items_per_second=551.111M/s
BatchToTensorSimple<FloatType> 575 ns 574 ns 1239991 bytes_per_second=15.5835Gi/s items_per_second=522.896M/s
BatchToTensorSimple<DoubleType> 567 ns 566 ns 1152074 bytes_per_second=31.5943Gi/s items_per_second=530.065M/s
Current output when running archery benchmark diff --benchmark-filter=BatchToTensorSimple:
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-e8lvkw1g/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 27.50, 28.87, 23.74
-----------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536/10000 4121 us 4107 us 171 bytes_per_second=15.217Mi/s items_per_second=12.765G/s null_percent=0.01 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/100 4273 us 4219 us 170 bytes_per_second=14.8143Mi/s items_per_second=12.4271G/s null_percent=1 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/10 4019 us 4003 us 173 bytes_per_second=15.6149Mi/s items_per_second=13.0988G/s null_percent=10 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/2 4100 us 4083 us 136 bytes_per_second=15.3084Mi/s items_per_second=12.8416G/s null_percent=50 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/1 3972 us 3894 us 178 bytes_per_second=16.0516Mi/s items_per_second=13.465G/s null_percent=100 size=65.536k
BatchToTensorSimple<UInt8Type>/65536/0 3953 us 3927 us 178 bytes_per_second=15.9142Mi/s items_per_second=13.3498G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304/10000 15398661 us 1947088 us 1 bytes_per_second=2.05435Mi/s items_per_second=1.72331G/s null_percent=0.01 size=4.1943M
.
.
.
Output from running the benchmarks on the latest commit:
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-y9o8zv4d/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 20.67, 17.39, 10.95
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536 443099 ns 442863 ns 1580 bytes_per_second=141.127Mi/s items_per_second=14.7983G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt8Type>/4194304 38391076 ns 35795222 ns 18 bytes_per_second=111.747Mi/s items_per_second=11.7175G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt16Type>/65536 882040 ns 881129 ns 747 bytes_per_second=70.9318Mi/s items_per_second=7.43773G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt16Type>/4194304 118462838 ns 81059222 ns 9 bytes_per_second=49.3466Mi/s items_per_second=5.17437G/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt32Type>/65536 1937139 ns 1933673 ns 361 bytes_per_second=32.3219Mi/s items_per_second=3.3892G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt32Type>/4194304 1271556625 ns 651396000 ns 1 bytes_per_second=6.14066Mi/s items_per_second=643.895M/s null_percent=0 size=4.1943M
BatchToTensorSimple<UInt64Type>/65536 4440503 ns 4344614 ns 166 bytes_per_second=14.3856Mi/s items_per_second=1.50844G/s null_percent=0 size=65.536k
BatchToTensorSimple<UInt64Type>/4194304 1.1486e+10 ns 1742537000 ns 1 bytes_per_second=2.2955Mi/s items_per_second=240.701M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int8Type>/65536 415187 ns 410957 ns 1710 bytes_per_second=152.084Mi/s items_per_second=15.9472G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int8Type>/4194304 34241740 ns 33962150 ns 20 bytes_per_second=117.778Mi/s items_per_second=12.3499G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int16Type>/65536 812298 ns 810349 ns 917 bytes_per_second=77.1273Mi/s items_per_second=8.08738G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int16Type>/4194304 75301182 ns 70352375 ns 8 bytes_per_second=56.8566Mi/s items_per_second=5.96185G/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int32Type>/65536 2033466 ns 2026663 ns 329 bytes_per_second=30.8389Mi/s items_per_second=3.23369G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int32Type>/4194304 1233238541 ns 562396000 ns 1 bytes_per_second=7.11243Mi/s items_per_second=745.792M/s null_percent=0 size=4.1943M
BatchToTensorSimple<Int64Type>/65536 3969188 ns 3959770 ns 178 bytes_per_second=15.7837Mi/s items_per_second=1.65505G/s null_percent=0 size=65.536k
BatchToTensorSimple<Int64Type>/4194304 1.5188e+10 ns 1823171000 ns 1 bytes_per_second=2.19398Mi/s items_per_second=230.055M/s null_percent=0 size=4.1943M
BatchToTensorSimple<HalfFloatType>/65536 899771 ns 888509 ns 749 bytes_per_second=70.3426Mi/s items_per_second=7.37595G/s null_percent=0 size=65.536k
BatchToTensorSimple<HalfFloatType>/4194304 71104797 ns 69327375 ns 8 bytes_per_second=57.6973Mi/s items_per_second=6.05G/s null_percent=0 size=4.1943M
BatchToTensorSimple<FloatType>/65536 2025175 ns 2021084 ns 347 bytes_per_second=30.924Mi/s items_per_second=3.24262G/s null_percent=0 size=65.536k
BatchToTensorSimple<FloatType>/4194304 1087905188 ns 395840500 ns 2 bytes_per_second=10.1051Mi/s items_per_second=1.05959G/s null_percent=0 size=4.1943M
BatchToTensorSimple<DoubleType>/65536 4118269 ns 4089947 ns 170 bytes_per_second=15.2814Mi/s items_per_second=1.60237G/s null_percent=0 size=65.536k
BatchToTensorSimple<DoubleType>/4194304 9901101750 ns 1684713000 ns 1 bytes_per_second=2.37429Mi/s items_per_second=248.963M/s null_percent=0 size=4.1943M
Looking better after your last suggestions Joris 🎉
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-w9c6kiee/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 16.45, 17.62, 11.63
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<UInt8Type>/65536 458536 ns 458333 ns 1504 bytes_per_second=13.3168Gi/s items_per_second=14.2988G/s
BatchToTensorSimple<UInt8Type>/4194304 51650596 ns 40859385 ns 13 bytes_per_second=9.56023Gi/s items_per_second=10.2652G/s
BatchToTensorSimple<UInt16Type>/65536 443072 ns 441767 ns 1327 bytes_per_second=13.8161Gi/s items_per_second=7.41748G/s
BatchToTensorSimple<UInt16Type>/4194304 37997653 ns 36128500 ns 18 bytes_per_second=10.8121Gi/s items_per_second=5.8047G/s
BatchToTensorSimple<UInt32Type>/65536 556504 ns 525753 ns 1625 bytes_per_second=11.6091Gi/s items_per_second=3.11629G/s
BatchToTensorSimple<UInt32Type>/4194304 47726554 ns 38831059 ns 17 bytes_per_second=10.0596Gi/s items_per_second=2.70035G/s
BatchToTensorSimple<UInt64Type>/65536 543929 ns 510296 ns 1071 bytes_per_second=11.9607Gi/s items_per_second=1.60534G/s
BatchToTensorSimple<UInt64Type>/4194304 104417887 ns 55937176 ns 17 bytes_per_second=6.98328Gi/s items_per_second=937.28M/s
BatchToTensorSimple<Int8Type>/65536 542291 ns 530495 ns 1000 bytes_per_second=11.5053Gi/s items_per_second=12.3537G/s
BatchToTensorSimple<Int8Type>/4194304 55069580 ns 44818231 ns 13 bytes_per_second=8.71576Gi/s items_per_second=9.35848G/s
BatchToTensorSimple<Int16Type>/65536 472947 ns 466738 ns 1604 bytes_per_second=13.077Gi/s items_per_second=7.02065G/s
BatchToTensorSimple<Int16Type>/4194304 45937775 ns 40318200 ns 15 bytes_per_second=9.68855Gi/s items_per_second=5.2015G/s
BatchToTensorSimple<Int32Type>/65536 439955 ns 438705 ns 1351 bytes_per_second=13.9126Gi/s items_per_second=3.73463G/s
BatchToTensorSimple<Int32Type>/4194304 38181667 ns 36099833 ns 18 bytes_per_second=10.8207Gi/s items_per_second=2.90466G/s
BatchToTensorSimple<Int64Type>/65536 440425 ns 439585 ns 1583 bytes_per_second=13.8847Gi/s items_per_second=1.86358G/s
BatchToTensorSimple<Int64Type>/4194304 51548936 ns 39940333 ns 15 bytes_per_second=9.78021Gi/s items_per_second=1.31268G/s
BatchToTensorSimple<HalfFloatType>/65536 435417 ns 434107 ns 1526 bytes_per_second=14.0599Gi/s items_per_second=7.54836G/s
BatchToTensorSimple<HalfFloatType>/4194304 48649122 ns 38652385 ns 13 bytes_per_second=10.1061Gi/s items_per_second=5.42567G/s
BatchToTensorSimple<FloatType>/65536 432115 ns 430647 ns 1522 bytes_per_second=14.1729Gi/s items_per_second=3.80451G/s
BatchToTensorSimple<FloatType>/4194304 42923344 ns 38628000 ns 16 bytes_per_second=10.1125Gi/s items_per_second=2.71455G/s
BatchToTensorSimple<DoubleType>/65536 442113 ns 441402 ns 1304 bytes_per_second=13.8276Gi/s items_per_second=1.85591G/s
BatchToTensorSimple<DoubleType>/4194304 60867021 ns 44292875 ns 16 bytes_per_second=8.81914Gi/s items_per_second=1.18368G/s
Thanks for this @AlenkaF . I have two general suggestions here:
- given that the types are purely physical here (i.e. float32 should use the same conversion code as int32 and uint32), we don't need to benchmark all numeric data types, we can limit ourselves to four integer types: int8, int16, int32, int64
- on the other hand, it would be nice to exercise different numbers of columns, because that could affect conversion performance: for example 3, 30, 300?
Does it make sense @AlenkaF @jorisvandenbossche ?
It does! Will update 👍
@pitrou I have included your suggestions. This is the output with the latest changes:
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-s2l7kna2/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 19.25, 15.77, 10.23
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<Int8Type>/65536/3 3803 ns 3802 ns 179651 bytes_per_second=48.1626Gi/s items_per_second=51.7142G/s
BatchToTensorSimple<Int8Type>/65536/30 141770 ns 140390 ns 5332 bytes_per_second=13.0426Gi/s items_per_second=14.0044G/s
BatchToTensorSimple<Int8Type>/65536/300 1509271 ns 1488588 ns 471 bytes_per_second=12.3006Gi/s items_per_second=13.2077G/s
BatchToTensorSimple<Int8Type>/4194304/3 892806 ns 890394 ns 792 bytes_per_second=13.1613Gi/s items_per_second=14.1318G/s
BatchToTensorSimple<Int8Type>/4194304/30 10833571 ns 10319294 ns 68 bytes_per_second=11.3562Gi/s items_per_second=12.1936G/s
BatchToTensorSimple<Int8Type>/4194304/300 1128155000 ns 551951000 ns 1 bytes_per_second=2.12315Gi/s items_per_second=2.27972G/s
BatchToTensorSimple<Int16Type>/65536/3 3781 ns 3769 ns 185929 bytes_per_second=48.5878Gi/s items_per_second=26.0854G/s
BatchToTensorSimple<Int16Type>/65536/30 129794 ns 129615 ns 5636 bytes_per_second=14.1269Gi/s items_per_second=7.58431G/s
BatchToTensorSimple<Int16Type>/65536/300 1553976 ns 1550687 ns 435 bytes_per_second=11.808Gi/s items_per_second=6.33938G/s
BatchToTensorSimple<Int16Type>/4194304/3 824934 ns 822791 ns 882 bytes_per_second=14.2427Gi/s items_per_second=7.64648G/s
BatchToTensorSimple<Int16Type>/4194304/30 9991414 ns 9954623 ns 69 bytes_per_second=11.7722Gi/s items_per_second=6.32013G/s
BatchToTensorSimple<Int16Type>/4194304/300 791524063 ns 310795500 ns 2 bytes_per_second=3.77057Gi/s items_per_second=2.02431G/s
BatchToTensorSimple<Int32Type>/65536/3 3717 ns 3712 ns 183626 bytes_per_second=49.3228Gi/s items_per_second=13.24G/s
BatchToTensorSimple<Int32Type>/65536/30 135493 ns 134325 ns 5035 bytes_per_second=13.6315Gi/s items_per_second=3.65918G/s
BatchToTensorSimple<Int32Type>/65536/300 1607824 ns 1600713 ns 436 bytes_per_second=11.439Gi/s items_per_second=3.07063G/s
BatchToTensorSimple<Int32Type>/4194304/3 863068 ns 860123 ns 782 bytes_per_second=13.6245Gi/s items_per_second=3.6573G/s
BatchToTensorSimple<Int32Type>/4194304/30 10307080 ns 10272412 ns 68 bytes_per_second=11.408Gi/s items_per_second=3.06231G/s
BatchToTensorSimple<Int32Type>/4194304/300 261872267 ns 147986600 ns 5 bytes_per_second=7.91879Gi/s items_per_second=2.12568G/s
BatchToTensorSimple<Int64Type>/65536/3 3725 ns 3722 ns 183079 bytes_per_second=49.1992Gi/s items_per_second=6.60341G/s
BatchToTensorSimple<Int64Type>/65536/30 126616 ns 126444 ns 5720 bytes_per_second=14.4811Gi/s items_per_second=1.94362G/s
BatchToTensorSimple<Int64Type>/65536/300 1508292 ns 1506162 ns 445 bytes_per_second=12.1571Gi/s items_per_second=1.6317G/s
BatchToTensorSimple<Int64Type>/4194304/3 837330 ns 835840 ns 833 bytes_per_second=14.0203Gi/s items_per_second=1.88178G/s
BatchToTensorSimple<Int64Type>/4194304/30 9866716 ns 9823261 ns 69 bytes_per_second=11.9296Gi/s items_per_second=1.60116G/s
BatchToTensorSimple<Int64Type>/4194304/300 745086639 ns 325750333 ns 3 bytes_per_second=3.59746Gi/s items_per_second=482.843M/s
Latest output:
Running /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/arrow-archery-1wpfanyn/WORKSPACE/build/release/arrow-tensor-benchmark
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 31.29, 25.96, 16.82
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------
BatchToTensorSimple<Int8Type>/65536/3 2441 ns 2440 ns 340594 bytes_per_second=25.0118Gi/s items_per_second=26.8563G/s
BatchToTensorSimple<Int8Type>/65536/30 3158 ns 3157 ns 219021 bytes_per_second=19.3277Gi/s items_per_second=20.753G/s
BatchToTensorSimple<Int8Type>/65536/300 13722 ns 13719 ns 50473 bytes_per_second=4.43975Gi/s items_per_second=4.76715G/s
BatchToTensorSimple<Int8Type>/4194304/3 277670 ns 277521 ns 2510 bytes_per_second=14.0755Gi/s items_per_second=15.1135G/s
BatchToTensorSimple<Int8Type>/4194304/30 293289 ns 293183 ns 2430 bytes_per_second=13.3236Gi/s items_per_second=14.3061G/s
BatchToTensorSimple<Int8Type>/4194304/300 298143 ns 297779 ns 2263 bytes_per_second=13.1179Gi/s items_per_second=14.0853G/s
BatchToTensorSimple<Int16Type>/65536/3 2181 ns 2179 ns 394604 bytes_per_second=28.0054Gi/s items_per_second=15.0353G/s
BatchToTensorSimple<Int16Type>/65536/30 3247 ns 3236 ns 220372 bytes_per_second=18.8588Gi/s items_per_second=10.1247G/s
BatchToTensorSimple<Int16Type>/65536/300 14148 ns 14137 ns 46621 bytes_per_second=4.30854Gi/s items_per_second=2.31313G/s
BatchToTensorSimple<Int16Type>/4194304/3 277347 ns 277092 ns 2553 bytes_per_second=14.0973Gi/s items_per_second=7.56842G/s
BatchToTensorSimple<Int16Type>/4194304/30 370514 ns 323043 ns 2535 bytes_per_second=12.092Gi/s items_per_second=6.49187G/s
BatchToTensorSimple<Int16Type>/4194304/300 297281 ns 296810 ns 2113 bytes_per_second=13.1598Gi/s items_per_second=7.06513G/s
BatchToTensorSimple<Int32Type>/65536/3 2349 ns 2346 ns 387584 bytes_per_second=26.0117Gi/s items_per_second=6.98246G/s
BatchToTensorSimple<Int32Type>/65536/30 3163 ns 3158 ns 213616 bytes_per_second=19.3208Gi/s items_per_second=5.18638G/s
BatchToTensorSimple<Int32Type>/65536/300 13852 ns 13840 ns 49582 bytes_per_second=4.3606Gi/s items_per_second=1.17054G/s
BatchToTensorSimple<Int32Type>/4194304/3 342283 ns 319630 ns 1969 bytes_per_second=12.2212Gi/s items_per_second=3.28059G/s
BatchToTensorSimple<Int32Type>/4194304/30 290756 ns 286728 ns 2381 bytes_per_second=13.6233Gi/s items_per_second=3.65699G/s
BatchToTensorSimple<Int32Type>/4194304/300 300295 ns 297110 ns 2360 bytes_per_second=13.1465Gi/s items_per_second=3.529G/s
BatchToTensorSimple<Int64Type>/65536/3 2204 ns 2197 ns 410967 bytes_per_second=27.7705Gi/s items_per_second=3.7273G/s
BatchToTensorSimple<Int64Type>/65536/30 3176 ns 3162 ns 216236 bytes_per_second=19.3002Gi/s items_per_second=2.59043G/s
BatchToTensorSimple<Int64Type>/65536/300 13656 ns 13588 ns 51372 bytes_per_second=4.4415Gi/s items_per_second=596.128M/s
BatchToTensorSimple<Int64Type>/4194304/3 270131 ns 268433 ns 2622 bytes_per_second=14.552Gi/s items_per_second=1.95313G/s
BatchToTensorSimple<Int64Type>/4194304/30 297324 ns 292629 ns 2026 bytes_per_second=13.3486Gi/s items_per_second=1.79162G/s
BatchToTensorSimple<Int64Type>/4194304/300 293260 ns 291513 ns 2290 bytes_per_second=13.3951Gi/s items_per_second=1.79786G/s
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit fc87fd75d6602562e64abf8744890332e35f979e.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.