firmware icon indicating copy to clipboard operation
firmware copied to clipboard

Raspberry Pi 5 cannot overclock beyond 3.0GHz due to firmware limit(?)

Open youmukonpaku1337 opened this issue 1 year ago • 87 comments

Is this the right place for my bug report? This issue seems to be firmware-related, as the clocking is done through it.

Describe the bug Setting arm_freq beyond 3000 works fine, but vcgencmd measure_clock arm reports 3000 MHz, while software like Geekbench and btop detect it as the clock set by arm_freq, e.g. 3.1GHz

To reproduce

  1. Set arm_freq beyond 3000, and an according over_voltage_delta
  2. Reboot, and run vcgencmd measure_clock arm
  3. Check with something else, like btop or Geekbench
  4. Clocks will be mismatched and vcgencmd will only report 3.0GHz

Expected behaviour The Pi is actually clocked beyond 3.0GHz and both vcgencmd and other software report it as such

Actual behaviour The Pi is only clocked to 3.0GHz, and vcgencmd reports it as such, but software sees it as set in arm_freq

System https://pastebin.com/U2KCBBnD

  • Which model of Raspberry Pi? Pi 5
  • Which OS and version (cat /etc/rpi-issue)? Raspberry Pi reference 2023-12-05 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4
  • Which firmware version (vcgencmd version)? 2024/02/16 15:28:41 Copyright (c) 2012 Broadcom version 4c845bd3 (release) (embedded)
  • Which kernel version (uname -a)? Linux q-raspi5 6.6.17-v8-16k+ #1735 SMP PREEMPT Wed Feb 21 14:45:17 GMT 2024 aarch64 GNU/Linux Logs dmesg output is in the raspinfo paste

Additional context If this is relevant, I used rpi-update to update to latest kernel and firmware version, no change
I have also set debian sources in sources.list to testing/trixie

youmukonpaku1337 avatar Mar 09 '24 11:03 youmukonpaku1337

I'm pretty sure the Pi 5 can handle clocks beyond 3.0GHz as it's extremely stable at that clock, so that as well

youmukonpaku1337 avatar Mar 09 '24 12:03 youmukonpaku1337

That's what we've been told is the limit of the PLL by Broadcom.

I've got a todo item to investigate what happens when this is exceeded, but it's not high on the priority list.

popcornmix avatar Mar 11 '24 20:03 popcornmix

That's what we've been told is the limit of the PLL by Broadcom.

I've got a todo item to investigate what happens when this is exceeded, but it's not high on the priority list.

ah i see, that's sad, hope it gets fixed soon! would love to run my pi at absurd clocks

youmukonpaku1337 avatar Mar 11 '24 22:03 youmukonpaku1337

rpi-eeprom-recovery.zip

I've removed the 3GHz limit, and attached a zip file (you can flash it to an sdcard with rpi-imager) you can test.

Make sure you have no critical (unbacked up) data on the Pi you are testing. Let me know if you succeed in going above 3GHz.

I could boot at 3.1GHz (and vcgencmd measure_clock arm confirmed that) but my Pi would crash when stressed.

popcornmix avatar Mar 14 '24 15:03 popcornmix

that's actually awesome, tysm, i assume i just flash it to an sd card that isnt the one i have raspbian on and boot?

youmukonpaku1337 avatar Mar 14 '24 15:03 youmukonpaku1337

Yes - use a spare card.

popcornmix avatar Mar 14 '24 16:03 popcornmix

alright, thanks, will test asap

youmukonpaku1337 avatar Mar 14 '24 16:03 youmukonpaku1337

Dangit how did I not see this issue before now :)

Going to see if I accidentally nuke my 'blessed' Pi 5 (the only one I've been able to get to 3.0 GHz so far).

geerlingguy avatar Mar 14 '24 16:03 geerlingguy

Dangit how did I not see this issue before now :)

Going to see if I accidentally nuke my 'blessed' Pi 5 (the only one I've been able to get to 3.0 GHz so far).

LMAO

youmukonpaku1337 avatar Mar 14 '24 16:03 youmukonpaku1337

Petition to make 3.14GHz the new upper limit in the firmware.

Mauker1 avatar Mar 14 '24 17:03 Mauker1

So I've been trying to get a Geekbench 6 run to complete at 3.14 GHz, testing higher and higher over_voltage_delta (with force_turbo off and on), and so far can't quite hack it.

I wound up capturing this from dmesg:

[  326.258634] ------------[ cut here ]------------
[  326.258637] Firmware transaction timeout
[  326.258646] WARNING: CPU: 3 PID: 31 at drivers/firmware/raspberrypi.c:67 rpi_firmware_property_list+0x204/0x270
[  326.258654] Modules linked in: algif_hash algif_skcipher af_alg bnep vc4 snd_soc_hdmi_codec binfmt_misc aes_ce_blk drm_display_helper cec aes_ce_cipher drm_dma_helper drm_kms_helper hci_uart ghash_ce snd_soc_core btbcm gf128mul snd_compress sha2_ce snd_pcm_dmaengine brcmfmac sha256_arm64 sha1_ce bluetooth snd_pcm brcmutil snd_timer snd rpivid_hevc(C) cfg80211 v4l2_mem2mem pisp_be videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 ecdh_generic fb_sys_fops ecc syscopyarea sysfillrect sysimgblt rfkill libaes videobuf2_common v3d videodev raspberrypi_hwmon mc gpu_sched drm_shmem_helper raspberrypi_gpiomem rp1_adc pwm_fan nvmem_rmem uio_pdrv_genirq uio fuse drm drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6 spidev i2c_brcmstb spi_bcm2835 gpio_keys
[  326.258699] CPU: 3 PID: 31 Comm: kworker/3:0 Tainted: G         C         6.1.0-rpi7-rpi-2712 #1  Debian 1:6.1.63-1+rpt1
[  326.258702] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
client_loop: send disconnect: Broken pipe

And the cursor blinks on the external display, but SSH goes away.

Checking the actual voltage:

$ vcgencmd measure_volts
volt=1.0000V

So I'm wondering if there's any way to boost that further, or if 1.0000V is the hard limit for the cores?

I should note I have an Argon THRML 60-RC and an additional giant 140mm Noctua fan blowing over everything, fan set to full blast (pinctrl FAN_PWM op dl — for some reason my custom setting for fan_temp0 through 4 to run the speed at 255 doesn't seem to make a difference).

geerlingguy avatar Mar 14 '24 18:03 geerlingguy

Yay! Got one run in at 3.14 GHz with:

over_voltage=8
arm_freq=3140
force_turbo=1

Honestly not sure if the over_voltage vs over_voltage_delta made a difference or if I just got lucky on this run and unlucky on the other runs.

Here's the result: https://browser.geekbench.com/v6/cpu/5314274

And a video! https://www.youtube.com/watch?v=TTIkZBsVJyA

geerlingguy avatar Mar 14 '24 18:03 geerlingguy

HELL YEAH!!!

youmukonpaku1337 avatar Mar 14 '24 21:03 youmukonpaku1337

I've tried flashing, but I just see the boot menu. I tried two different drives with the bootloader but nothing happens?

senothechad avatar Mar 14 '24 21:03 senothechad

rpi-eeprom-recovery.zip

I've removed the 3GHz limit, and attached a zip file (you can flash it to an sdcard with rpi-imager) you can test.

Make sure you have no critical (unbacked up) data on the Pi you are testing. Let me know if you succeed in going above 3GHz.

I could boot at 3.1GHz (and vcgencmd measure_clock arm confirmed that) but my Pi would crash when stressed.

How was it made?

Cgamess avatar Mar 14 '24 21:03 Cgamess

With a C compiler, mostly. @popcornmix is a Raspberry Pi engineer.

pelwell avatar Mar 14 '24 21:03 pelwell

Yay! Got one run in at 3.14 GHz with:

over_voltage=8
arm_freq=3140
force_turbo=1

Honestly not sure if the over_voltage vs over_voltage_delta made a difference or if I just got lucky on this run and unlucky on the other runs.

Here's the result: https://browser.geekbench.com/v6/cpu/5314274

And a video! https://www.youtube.com/watch?v=TTIkZBsVJyA

i might try breaking 1k singlecore >:)

youmukonpaku1337 avatar Mar 14 '24 22:03 youmukonpaku1337

Here's the result: https://browser.geekbench.com/v6/cpu/5314274

That's your single/multi scores: 967 / 1793 (does really nobody notice how wrong this benchmark is when a quad core CPU scores multi-threaded not even twice as much as single-threaded?)

And here's one outperforming your setup at 972 / 1847 clocking the cores only at 3.0 GHz: https://browser.geekbench.com/v6/cpu/5312673

Forget about the displayed 3.2 GHz, that's just what the cpufreq driver thinks and on any RPi it has no clue about real clockspeeds. Geekbench on ARM with Linux starting from v4.2 on measures and also reports the clockspeeds in the warmup phase: https://browser.geekbench.com/v6/cpu/5312673.gb6 (you need a GB browser account to access these raw data files)

  "processor_frequency": {
    "frequencies": [
      2994,
      2992,
      2992,
      ...    
      2993,
      2997,
      2991
    ]

So what's different on that system? Maybe simply the user switched to performance cpufreq governor prior to firing up Geekbench? Maybe memory access is faster compared to @geerlingguy's run where the CPU cores were being measured at ~3133 MHz which is to be expected at the configured 3140 MHz?

ThomasKaiser avatar Mar 15 '24 09:03 ThomasKaiser

BTW: when talking about overclocking it's also a lot about DFVS since 'usually' higher clockspeeds need significantly higher supply voltages. One would expect to see a curve like this (but more exponentially growing at the right side in case of 'overclocking'):

sun50i-h6-5 4 20-OrangePi_Lite2_(worse_silicon)

With RPi 5 (at least with latest ThreadX/firmware 30cc5f37 / 2024/01/05 15:57:40) it looks either linear or funny:

arm_freq=3000:

bcm2712-30cc5f37-Raspberry_Pi_5B_(arm_freq=3000)

  1500 MHz    720.0 mV
  1600 MHz    760.0 mV
  1700 MHz    775.0 mV
  1800 MHz    790.0 mV
  1900 MHz    800.0 mV
  2000 MHz    815.0 mV
  2100 MHz    830.0 mV
  2200 MHz    845.0 mV
  2300 MHz    855.0 mV
  2400 MHz    870.0 mV
  2500 MHz    885.0 mV
  2600 MHz    900.0 mV
  2700 MHz    910.0 mV
  2800 MHz    925.0 mV
  2900 MHz    940.0 mV
  3000 MHz    955.0 mV

arm_freq=3000 combined with over_voltage=4:

bcm2712-30cc5f37-Raspberry_Pi_5B_(arm_freq=3000_over_voltage=4)

  1500 MHz    720.0 mV
  1600 MHz    860.0 mV
  1700 MHz    875.0 mV
  1800 MHz    885.0 mV
  1900 MHz    900.0 mV
  2000 MHz    915.0 mV
  2100 MHz    930.0 mV
  2200 MHz    940.0 mV
  2300 MHz    955.0 mV
  2400 MHz    970.0 mV
  2500 MHz    970.0 mV
  2600 MHz    970.0 mV
  2700 MHz    970.0 mV
  2800 MHz    970.0 mV
  2900 MHz    970.0 mV
  3000 MHz    970.0 mV

arm_freq=3000 combined with over_voltage_delta=50000:

bcm2712-30cc5f37-Raspberry_Pi_5B_(arm_freq=3000_over_voltage_delta=50000)

  1500 MHz    720.0 mV
  1600 MHz    805.0 mV
  1700 MHz    820.0 mV
  1800 MHz    835.0 mV
  1900 MHz    850.0 mV
  2000 MHz    860.0 mV
  2100 MHz    875.0 mV
  2200 MHz    890.0 mV
  2300 MHz    905.0 mV
  2400 MHz    915.0 mV
  2500 MHz    930.0 mV
  2600 MHz    945.0 mV
  2700 MHz    960.0 mV
  2800 MHz    970.0 mV
  2900 MHz    985.0 mV
  3000 MHz   1000.0 mV

It always starts at 720 mV for the lowest OPP (and this even when you adjust this with for example arm_freq_min=1000) and then some algorithm 'draws' a straight line up to the highest OPP except for the over_voltage setting where things get really weird since overvolting low OPP while keeping the same supply voltage for the 'overclocked' OPP is quite the opposite of what's expected when having silicon behaviour in mind.

Also the 'line drawing' behaviour starting at the lowest OPP ends up with strange behaviour. When not adjusting any of the arm_freq ThreadX settings the 1500 MHz OPP gets 720 mV. But when setting arm_freq_min=1000 the line gets drawn with a similar algorithm but now the 1500 MHz OPP is at 775 mV and not 720 mV any more:

  1000 MHz    720.0 mV
  1100 MHz    730.0 mV
  1200 MHz    740.0 mV
  1300 MHz    750.0 mV
  1400 MHz    765.0 mV
  1500 MHz    775.0 mV
  1600 MHz    785.0 mV
  1700 MHz    795.0 mV
  ...

One would expect that

  • the supply voltage of a certain clockspeed being a HW property and based on silicon testings with some safety headroom. Each cpufreq OPP has an ideal supply voltage (as low as possible to save energy and also as high as needed to allow for stable operation) that shouldn't 'move around' when adjusting arm_freq settings
  • ideally the SoC manufacturer allows for AVS due to 'silicon lottery' and as such 'lower quality' chips will automagically be driven with slightly higher supply voltages and optionally top cpufreq OPP denied if the supply voltage would exceed a critical limit
  • the DVFS OPP painting a curve (linear/flat on the left side and then exponentially growing for the higher/highest OPP) and not a straight line

@popcornmix are the 1st and 3rd issue somewhat addressed with the new ThreadX/firmware version you provided above?

ThomasKaiser avatar Mar 15 '24 11:03 ThomasKaiser

A few comments. The idle point of 0.72V is fixed for all boards. Below that voltage internal RAMs become unreliable, so you can't go lower than that however low the clock go.

Each chip has a unique base voltage (vpred), determined by querying ring oscillators. This is added to a fixed slope that increases with frequency. True, the real curve may not be flat, but over the non-overclocked range, it's pretty close to flat.

We don't characterise the overclocked range - you are on your own and can manually adjust with over_voltage_delta.

over_voltage is deprecated. It doesn't take into account vpred. over_voltage_delta is preferred.

There is a 1V ceiling that currently can't be exceeded.

popcornmix avatar Mar 15 '24 12:03 popcornmix

interesting. anyway, gotta break 1k in geekbench for the sillies, will see if i can manage maaaaybe 3.3ghz?

youmukonpaku1337 avatar Mar 15 '24 12:03 youmukonpaku1337

Edit: haven't seen the answers above before firing up this comment

@popcornmix are the 1st and 3rd issue somewhat addressed with the new ThreadX/firmware version you provided above?

Nope, just tested with arm_freq=3100 and over_voltage_delta=100000. The upper voltage limit with ThreadX build 4d574a2e is (still?) 1000mV and the algorithm 'drawing' a linear line is also still in place:

bcm2712-4d574a2e-Raspberry_Pi_5B_(arm_freq=3100_over_voltagee_delta=100000)

  1500 MHz    720.0 mV
  1600 MHz    855.0 mV
  1700 MHz    870.0 mV
  1800 MHz    885.0 mV
  1900 MHz    900.0 mV
  2000 MHz    910.0 mV
  2100 MHz    925.0 mV
  2200 MHz    940.0 mV
  2300 MHz    955.0 mV
  2400 MHz    965.0 mV
  2500 MHz    980.0 mV
  2600 MHz    995.0 mV
  2700 MHz   1000.0 mV
  2800 MHz   1000.0 mV
  2900 MHz   1000.0 mV
  3000 MHz   1000.0 mV
  3100 MHz   1000.0 mV

As such results as expected: when 1.0V can't be exceeded (most probably for a good reason) allowing for higher clockspeeds is just asking for trouble :)

ThomasKaiser avatar Mar 15 '24 12:03 ThomasKaiser

@ThomasKaiser - Yeah, sadly it looks like 1V is the upper limit, but maybe some fancy (highly destructive) hacking around can surpass it.

Regarding that other 3.0 GHz score beating my Geekbench 6 run, I wonder if it running the 4GB RAM part has anything to do with it (I think that was a 4 GB model). I know in some microbenchmarks, at least at some point the 4 GB boards ran faster than the 8 GB boards, when memory was important to the run.

I only have two 4 GB Pi 5s, and neither goes beyond 2.8 GHz reliably, so I can't confirm much.

geerlingguy avatar Mar 15 '24 15:03 geerlingguy

i wonder, is there no way to OC memory on a pi? i saw some firmware opts for it but havent checked much

youmukonpaku1337 avatar Mar 15 '24 15:03 youmukonpaku1337

Regarding that other 3.0 GHz score beating my Geekbench 6 run, I wonder if it running the 4GB RAM part has anything to do with it (I think that was a 4 GB model)

I'm currently testing around with my own RPi 5B (also 4 GB) and got an even better score at 3050 MHz: 975/2022. Since I'm also starting GB only through sbc-bench -G there's always memory latency measurement also done and I see here variations. My first try with 3050 MHz showed these ramlat scores:

Executing ramlat on cpu0 (Cortex-A76), results in ns:
   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.350 1.313 1.312 1.312 1.312 1.312 1.312 2.498 
     8k: 1.312 1.312 1.312 1.312 1.312 1.312 1.316 2.561 
    16k: 1.319 1.313 1.314 1.315 1.312 1.313 1.314 2.556 
    32k: 1.312 1.312 1.312 1.312 1.312 1.315 1.312 2.577 
    64k: 1.312 1.313 1.315 1.312 1.312 1.312 1.313 2.558 
   128k: 3.935 3.935 3.944 3.935 3.938 4.412 5.722 9.945 
   256k: 4.247 3.961 4.250 3.952 4.110 4.489 5.622 9.955 
   512k: 7.360 7.278 7.052 7.280 7.034 8.428 9.196 13.87 
  1024k: 13.22 12.90 12.87 12.88 13.05 13.45 15.19 22.17 
  2048k: 17.04 15.88 16.77 15.88 25.23 16.79 18.92 26.45 
  4096k: 67.59 61.33 67.46 61.59 68.93 68.06 80.18 101.3 
  8192k: 93.54 106.5 97.65 87.53 94.22 90.86 101.5 126.8 
 16384k: 104.1 101.3 103.9 101.8 103.6 104.9 126.2 127.4 
 32768k: 116.2 113.8 114.5 113.6 115.3 114.5 115.7 119.3 
 65536k: 118.7 117.5 127.3 117.3 118.6 117.7 118.7 121.1 
131072k: 120.0 118.9 119.9 118.8 120.0 128.6 119.4 120.2 

While now again testing with 3050 MHz I'm getting both worse latency and GB scores (955/1948 on average):

Executing ramlat on cpu0 (Cortex-A76), results in ns:
   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.324 1.314 1.312 1.312 1.312 1.312 1.312 2.500 
     8k: 1.312 1.312 1.312 1.312 1.312 1.312 1.312 2.555 
    16k: 1.318 1.312 1.313 1.312 1.312 1.312 1.312 2.555 
    32k: 1.312 1.312 1.312 1.312 1.312 1.312 1.312 2.579 
    64k: 1.312 1.312 1.312 1.312 1.313 1.313 1.312 2.558 
   128k: 3.935 3.936 3.936 3.935 3.935 4.468 5.765 9.939 
   256k: 3.939 3.936 3.962 3.976 4.007 4.437 5.633 9.944 
   512k: 7.443 7.492 7.521 7.490 7.456 8.786 8.990 13.97 
  1024k: 14.53 13.29 13.76 13.28 13.25 13.76 15.46 23.43 
  2048k: 26.47 27.03 26.91 26.98 41.58 25.73 27.83 34.22 
  4096k: 69.95 61.53 68.19 61.56 70.88 68.41 79.23 100.5 
  8192k: 94.90 107.5 100.6 88.62 96.38 96.22 105.3 135.3 
 16384k: 105.1 101.9 104.2 101.5 104.8 105.9 117.9 144.4 
 32768k: 117.5 115.9 116.9 115.7 117.1 115.6 119.2 129.1 
 65536k: 119.7 118.7 128.3 118.5 119.9 118.9 120.4 125.6 
131072k: 121.9 121.1 121.8 121.1 121.9 128.1 120.9 123.8 

Some of the individual benchmarks are rather sensitive to memory speed, some not (I tested this with a RK3588 board where one can easily adjust memory clock between 528 and 2112 MHz from userspace though forgot where I documented the results – maybe on your site somewhere in the comments). At least with my tests it looks like this comparing both runs: https://browser.geekbench.com/v6/cpu/compare/5326061?baseline=5324484

Asides different temperatures (in the first run my 'monster cooler' kept temperatures below/around 40°C and then I tried higher temps as per your recommendation wrt stability) I don't see any settings that might have changed and affect the behaviour...

IMG_2819 klein

As such would be interesting if you could compare memory bandwidth between 4GB/8GB models (an sbc-bench will already do it). And as a side note: with more recent ThreadX versions memory access seems to be faster than in the beginning.

Edit: my conclusions wrt memory speed affecting fluctuating GB scores were BS since the first run ('with lower memory latency') also produced scores that vary substantially: https://browser.geekbench.com/v6/cpu/compare/5324361?baseline=5324484 – unfortunately GB also has some sort of random number generator in place when generating scores.

Edit 2: confirmed. Another run at 3050 MHz with sbc-bench -G (always executing GB twice for a reason) ends up with the same picture: standard deviation way too high or in other words: Geekbench 6 on ARM and especially RISC-V sucks:

First run:

   Single-Core Score     949              
   Multi-Core Score      1960              

Second run:

   Single-Core Score     973              
   Multi-Core Score      2010              

https://browser.geekbench.com/v6/cpu/compare/5326624?baseline=5326716

ThomasKaiser avatar Mar 15 '24 16:03 ThomasKaiser

interesting..

youmukonpaku1337 avatar Mar 15 '24 16:03 youmukonpaku1337

with more recent ThreadX versions memory access seems to be faster than in the beginning.

Actually there is no ThreadX on a Pi5. The bootloader code runs with no RTOS. ThreadX is used by start*.elf on pi0-4.

SDRAM performance has been improved recently by scaling back refresh with (sdram) temperature See: https://github.com/raspberrypi/firmware/issues/1854#issuecomment-1924141212

popcornmix avatar Mar 15 '24 16:03 popcornmix

with more recent ThreadX versions memory access seems to be faster than in the beginning.

Actually there is no ThreadX on a Pi5. The bootloader code runs with no RTOS. ThreadX is used by start*.elf on pi0-4.

SDRAM performance has been improved recently by scaling back refresh with (sdram) temperature See: #1854 (comment)

is it possible to use both the test bootloader and the patched firmware?

youmukonpaku1337 avatar Mar 15 '24 16:03 youmukonpaku1337

@ThomasKaiser You keep referring to ThreadX, but there is no ThreadX running on a Pi 5.

pelwell avatar Mar 15 '24 16:03 pelwell

nevermind, looks like the sdram change was merged

youmukonpaku1337 avatar Mar 15 '24 16:03 youmukonpaku1337