throttled icon indicating copy to clipboard operation
throttled copied to clipboard

T580 with dGPU (too hot)

Open Uatschitchun opened this issue 7 years ago • 36 comments

Hi there,

firstly let me thank you for your work on this regard!

I've got multiple questions and some problems, you could help me with:

1.) If I disable systemd service, it seems the PL1/2 settings aren't reverted to the original values, correct? Even after stopping the service and rebooting, turbostat reports them still as what I last have set them with lenovo-fix?

Ok, tested again. I explicitly set PL1_Tdp=17 and after disable systemd service and rebooting, this is what turbostat reports:

cpu0: PKG Limit #1: ENabled (29.000000 Watts, 28.000000 sec, clamp ENabled)                                    
cpu0: PKG Limit #2: ENabled (44.000000 Watts, 0.002441* sec, clamp DISabled)  

Setting anything with lenovo-fix then, shows up with turbostat then. Stopping lenovo-fix again, does not reset the values!

Btw, from the above: So 29 & 44 are system's defaults and safe to be set in conf as these are system's defaults?

2.) As my Laptop has a dedicated Nvidia GPU, too, there's a heat problem, when using your script with installation standards. The system is running stable and fast, but the dGPU has a fall-off temperature of 76°C. As soon as the GPU reaches this value, it gets throttled to a 3rd of it's frequency (around 400MHz, instead of 1600). Strange thing here is, the GPU only levels its frequency within a range of 100MHz (1600-1700) when running glxspheres64 for instance. So, what happens is, the GPU (as not able to throttle itself down more in multiple steps) reaches the 76° quite fast, as the machine's cooling system isn't able to vent off the heat resulting from higher performance with lenovo-fix.

I stumble upon this, as phoronix/openarena gets from app. 120FPS to app. 35FPS.

The GPU is clocked with around 400MHz until it gets down below 60° again, which doesn't happenm when CPU is producing heat.

So, the installation standards do not fit this machine when used with dGPU.
I tried undervolting, which makes the system last longer until the GPU's fall-off is reached, but system still gets too hot!

Using this config:

[GENERAL]
Enabled: True

## Settings to apply while connected to AC power
[AC]
# Update the registers every this many seconds
Update_Rate_s: 5
# Max package power for time window #1
PL1_Tdp_W: 15
# Time window #1 duration
PL1_Duration_s: 28
# Max package power for time window #2
PL2_Tdp_W: 44
# Time window #2 duration
PL2_Duration_S: 0.002
# Max allowed temperature before throttling
Trip_Temp_C: 93
# Set HWP energy performance hints to 'performance' on high load (EXPERIMENTAL)
HWP_Mode: False
# Set cTDP to normal=0, down=1 or up=2 (EXPERIMENTAL)
cTDP: 0

[UNDERVOLT]
# CPU core voltage offset (mV)
CORE: -120
# Integrated GPU voltage offset (mV)
GPU: -100
# CPU cache voltage offset (mV)
CACHE: -120
# System Agent voltage offset (mV)
UNCORE: -120
# Analog I/O voltage offset (mV)
ANALOGIO: 0

With glxspehere64 CPU is running around 3800MHz, 15W & 80°, which is nice. But dGPU is constantly heating up, as not clocked further down than 1657MHz and reached it's thermal limit of 76° with hard fall-off.

So, I somehow need a way, to solve this, as once I bring in the dGPU into the mix, I need less power, for to to overpower the cooling system, whereas performance withou dGPU heavy working, is nice'n stable...

2a) When heating up, the touchpad gets irresponsive until I end glxsphere!? No entries in journal about that!?

3.) Is my following assumption correct?
The I7-8550U is declared with 15W TDP. cTDP_up would be 25W. So if setting PL1_Tdp=25, I need to set cTDP=2?

And, with having a TDP of 15W, setting 44W in conf is quite high above the TDP? Or am I missing something?

Tests with s-tui & mprime -t showed, that even when setting cTDP=1, system is using app. 30W while around 3.300MHz and around 88° with s-tui stress test. So that's double the TDP and 3 times cTDP (10W when down).

4.) I need a little help regarding PL1/2. If I set 29/44W PL1/2, using stress from s-tui, I get 3700MHz for about haf a minute, then Power drops to around 3400MHz and 28W Power. Expected, as PL1 is 29W. The stress is running continously with this power and frequency then. No further throttling to 15W is experienced?! System is undervolted:

[UNDERVOLT]
# CPU core voltage offset (mV)
CORE: -120
# Integrated GPU voltage offset (mV)
GPU: -100
# CPU cache voltage offset (mV)
CACHE: -120
# System Agent voltage offset (mV)
UNCORE: -120
# Analog I/O voltage offset (mV)
ANALOGIO: 0

My system is: T580, I7-8550U, dGPU, 16GB Ram, Bionic 18.04, nvidia-396

Uatschitchun avatar Aug 31 '18 15:08 Uatschitchun

Regarding 2a): journalctl:

Aug 31 17:09:34 T580-Test kernel: thinkpad_acpi: unknown possible thermal alarm or keyboard event received
Aug 31 17:09:34 T580-Test kernel: thinkpad_acpi: unhandled HKEY event 0x6032
Aug 31 17:09:34 T580-Test kernel: thinkpad_acpi: please report the conditions when this event happened to [email protected]

I'll report that... Right after stopping glxspehere, the touchpad is responsive again!

Uatschitchun avatar Aug 31 '18 15:08 Uatschitchun

Regarding 2.) Without using lenovo-fix, the power gets dynamically adjusted, down to 7W & 2400MHz, keeping the dGPU from hitting 76! fall-off

Someway to mimic this within this fix?

Uatschitchun avatar Aug 31 '18 15:08 Uatschitchun

  1. Yes, it will not get reset. You need to power off your machine to reset the values - a reboot may not be enough. You may want to disable the service so they do not get applied next time. However there may also be some other software changing values (e.g. thermald).
  2. I don't know what could get configured for the GPU or how much heat the cooling of your model can handle. Also some other software may throttle it like thermald. You can try setting Trip_Temp_C to a lower temperature. 2a) sounds like too high temperature on some peripheral IC or possibly a driver issue. Or again other software interfering
  3. correct, as stated in the comment in the config file 2 corresponds to cTDP up (25W). However the turbo is allowed to go beyond that for a short period of time and thats why you may set PL to 44W.
  4. First of all your undervolting might be too much. Most Lenovo laptops seem to get unstable at about -90, -70, -90, -70, -50 (or around that). Beside that this sounds good.

Hope that helps you investigating it more. Your issue is quite interesting. edit: Also try --debug

DEvil0000 avatar Aug 31 '18 15:08 DEvil0000

  1. That thermal limit is ridiculous for a laptop. Can you check with GPU-Z (on Windows) which is the maximum temperature for the GPU? It should be 94/97 'C. I guess that like the CPU also the GPU is throttled too early but I don't know if we can force that limit to be higher somehow. Also sharing the same heat-pipe and fan for both the CPU and the GPU is a major limiting factor on performance. If I understand correctly you are asking for a way to monitor the GPU usage in order to limit the CPU temperature when both are used, right?

  2. On the 8550u the standard 15W cTDP allows you to reach 1.8 GHz of base frequency, while setting cTDP up to 25W should raise it to 2.0 GHz. On the other end PL1/2 are used to set the upper power usage limit when turbo frequencies are in use. There is no real limit (kind of) to how high you can push these values if your cooling system can handle the heat and your power circuitry the requested current.

erpalma avatar Aug 31 '18 16:08 erpalma

Thx for the prompt answer ;)

No thermald, no tlp (atm) Thing is, from my investigations, that if I'm using the fix, power gets "fixed" on that value, whereas without the fix, system adapts power slightly up and down, according to the dGPU thermal, so it won't reach its 76° fall-off temp. This makes sense, as there are 2 but 1 devices heating up, when putting load on dGPU.

What I don't get is why the dGPU doesn't get throttled more, prior to falling off?

I'm monitoring the dGPU with nvidia-smi. It states:

    Temperature
        GPU Current Temp            : 57 C
        GPU Shutdown Temp           : 102 C
        GPU Slowdown Temp           : 97 C
        GPU Max Operating Temp      : 94 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Clocks
        Graphics                    : 1683 MHz
        SM                          : 1683 MHz
        Memory                      : 3003 MHz
        Video                       : 1506 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1911 MHz
        SM                          : 1911 MHz
        Memory                      : 3004 MHz
        Video                       : 1708 MHz

So yes, the temps are shown, as you state them. The max graphics clock (1911MHz) isn't reached ever? Max clock I get is around 1700 and it throttles no more than 100MHz. I've found some documentation regarding this 76° fall-off. (lemme see...) It seems some kind of EC throttling set by lenovo.

So with using the fix, I'm loosing frequency/power dynamic, which is needed to keep the dGPU below 76°, as if some kind of automatic gets disabled.

So, I like to use the fix, as it gives a good amount of power to the laptop, but still be able to use the dGPU without hitting the thermal limit.

Uatschitchun avatar Aug 31 '18 17:08 Uatschitchun

www.reddit.com/r/thinkpad/comments/8flj0i/t480_power_limit_throttles_down_to_5_watts_on/

Uatschitchun avatar Aug 31 '18 17:08 Uatschitchun

Could this be helpful, to reset to defaults, when service is stopped?

Default power limits can be found in the PKG_PWR_SKU MSR (614h)

www.technodocbox.com/PC_Support/74817174-8th-generation-intel-processor-family-for-s-processor-platforms.html See below the PDF No. 91

Uatschitchun avatar Aug 31 '18 18:08 Uatschitchun

throttle at 76°C is way to early. 90 i would understand.since it states 94 for max operating temp i would cap the cpu to max 90 or 92. since they share the cooling. maybe even 85 or such.changing the fans config with thinkfan may also help you.

-------- Ursprüngliche Nachricht -------- Von: Uatschitchun [email protected] Datum: 31.08.2018 19:36 (GMT+01:00) An: erpalma/lenovo-throttling-fix [email protected] Cc: "A. Binzxxxxxx" [email protected], Comment [email protected] Betreff: Re: [erpalma/lenovo-throttling-fix] T580 with dGPU (too hot) (#49)

Thx for the prompt answer ;) No thermald, no tlp (atm)

Thing is, from my investigations, that if I'm using the fix, power gets "fixed" on that value, whereas without the fix, system adapts power slightly up and down, according to the dGPU thermal, so it won't reach its 76° fall-off temp. This makes sense, as there are 2 but 1 devices heating up, when putting load on dGPU. What I don't get is why the dGPU doesn't get throttled more, prior to falling off? I'm monitoring the dGPU with nvidia-smi. It states: Temperature GPU Current Temp : 57 C GPU Shutdown Temp : 102 C GPU Slowdown Temp : 97 C GPU Max Operating Temp : 94 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Clocks Graphics : 1683 MHz SM : 1683 MHz Memory : 3003 MHz Video : 1506 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 1911 MHz SM : 1911 MHz Memory : 3004 MHz Video : 1708 MHz

So yes, the temps are shown, as you state them. The max graphics clock (1911MHz) isn't reached ever? Max clock I get is around 1700 and it throttles no more than 100MHz.

I've found some documentation regarding this 76° fall-off. (lemme see...) It seems some kind of EC throttling set by lenovo. So with using the fix, I'm loosing frequency/power dynamic, which is needed to keep the dGPU below 76°, as if some kind of automatic gets disabled. So, I like to use the fix, as it gives a good amount of power to the laptop, but still be able to use the dGPU without hitting the thermal limit.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/erpalma/lenovo-throttling-fix","title":"erpalma/lenovo-throttling-fix","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/erpalma/lenovo-throttling-fix"}},"updates":{"snippets":[{"icon":"PERSON","message":"@Uatschitchun in #49: Thx for the prompt answer ;)\r\n\r\nNo thermald, no tlp (atm)\r\nThing is, from my investigations, that if I'm using the fix, power gets "fixed" on that value, whereas without the fix, system adapts power slightly up and down, according to the dGPU thermal, so it won't reach its 76° fall-off temp. This makes sense, as there are 2 but 1 devices heating up, when putting load on dGPU.\r\n\r\nWhat I don't get is why the dGPU doesn't get throttled more, prior to falling off?\r\n\r\nI'm monitoring the dGPU with nvidia-smi. It states:\r\n\r\n Temperature\r\n GPU Current Temp : 57 C\r\n GPU Shutdown Temp : 102 C\r\n GPU Slowdown Temp : 97 C\r\n GPU Max Operating Temp : 94 C\r\n Memory Current Temp : N/A\r\n Memory Max Operating Temp : N/A\r\n Clocks\r\n Graphics : 1683 MHz\r\n SM : 1683 MHz\r\n Memory : 3003 MHz\r\n Video : 1506 MHz\r\n Applications Clocks\r\n Graphics : N/A\r\n Memory : N/A\r\n Default Applications Clocks\r\n Graphics : N/A\r\n Memory : N/A\r\n Max Clocks\r\n Graphics : 1911 MHz\r\n SM : 1911 MHz\r\n Memory : 3004 MHz\r\n Video : 1708 MHz\r\n\r\nSo yes, the temps are shown, as you state them. The max graphics clock (1911MHz) isn't reached ever? Max clock I get is around 1700 and it throttles no more than 100MHz. \r\nI've found some documentation regarding this 76° fall-off. (lemme see...) It seems some kind of EC throttling set by lenovo.\r\n\r\nSo with using the fix, I'm loosing frequency/power dynamic, which is needed to keep the dGPU below 76°, as if some kind of automatic gets disabled.\r\n\r\nSo, I like to use the fix, as it gives a good amount of power to the laptop, but still be able to use the dGPU without hitting the thermal limit. \r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417738078"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417738078", "url": "https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417738078", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [erpalma/lenovo-throttling-fix] T580 with dGPU (too hot) (#49)", "sections": [ { "text": "", "activityTitle": "Uatschitchun", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@Uatschitchun", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "IssueComment",\n"repositoryFullName": "erpalma/lenovo-throttling-fix",\n"issueId": 49,\n"IssueComment": "{{IssueComment.value}}"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "IssueClose",\n"repositoryFullName": "erpalma/lenovo-throttling-fix",\n"issueId": 49\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417738078" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "MuteNotification",\n"threadId": 374702732\n}" } ], "themeColor": "26292E" } ]​

DEvil0000 avatar Aug 31 '18 18:08 DEvil0000

0x614 is a good idea to have a look at. but my I7-8550U does not have it. so i guess you do not have it as well. try read/write it with rdmsr wrmsr

DEvil0000 avatar Aug 31 '18 19:08 DEvil0000

$ sudo rdmsr 0x614
78

Or how is it done?

Uatschitchun avatar Aug 31 '18 19:08 Uatschitchun

can you write the same value back? just as a test?

-------- Ursprüngliche Nachricht -------- Von: Uatschitchun [email protected] Datum: 31.08.2018 21:27 (GMT+01:00) An: erpalma/lenovo-throttling-fix [email protected] Cc: "A. Binzxxxxxx" [email protected], Comment [email protected] Betreff: Re: [erpalma/lenovo-throttling-fix] T580 with dGPU (too hot) (#49)

$ sudo rdmsr 0x614 78

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/erpalma/lenovo-throttling-fix","title":"erpalma/lenovo-throttling-fix","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/erpalma/lenovo-throttling-fix"}},"updates":{"snippets":[{"icon":"PERSON","message":"@Uatschitchun in #49: \r\n$ sudo rdmsr 0x614\r\n78\r\n"}],"action":{"name":"View Issue","url":"https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417766994"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417766994", "url": "https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417766994", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [erpalma/lenovo-throttling-fix] T580 with dGPU (too hot) (#49)", "sections": [ { "text": "", "activityTitle": "Uatschitchun", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@Uatschitchun", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "IssueComment",\n"repositoryFullName": "erpalma/lenovo-throttling-fix",\n"issueId": 49,\n"IssueComment": "{{IssueComment.value}}"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "IssueClose",\n"repositoryFullName": "erpalma/lenovo-throttling-fix",\n"issueId": 49\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/erpalma/lenovo-throttling-fix/issues/49#issuecomment-417766994" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "MuteNotification",\n"threadId": 374702732\n}" } ], "themeColor": "26292E" } ]

DEvil0000 avatar Aug 31 '18 19:08 DEvil0000

Sorry guys but I don't get how a CPU MSR can actually influence a dGPU. PACKAGE_POWER_SKU should be related to the min/max package power, that is the combined power of the CPU+iGPU+cache+whatever in the die.

The problem with the MX150 throttling is probably related to a setting in the dGPU bios or in the system EC/BIOS. Can you force max performance from the nVidia control panel? I guess it won't make any difference but worth a try.

erpalma avatar Sep 01 '18 07:09 erpalma

I'll try writing 78 to 0x614. But that's only in regard to 1.)

No difference if setting dGPU to performance or adaptive. I guess the dGPU's own throttling mechanisms hit in when reaching the max temperature (around 94°). As 76° is far from max, there seems to be some kind of magic, that throttles the CPU's power (down to 3-4W!! - see the reddit post, I linked) for to prevent reaching the dGPU's fall off. With the fix, this magic seems to be disabled. I'll provide some screenshots from s-tui with and without the fix.

There's also a german post in Lenovo forum and from what it seems, Lenovo is aware of the problem. I'll post a link, as I guess I'm not the only german here :-)

Uatschitchun avatar Sep 01 '18 10:09 Uatschitchun

https://forums.lenovo.com/t5/T4-T5-und-neuere-T-Serie/T480-mit-MX150-und-i5-Extremes-CPU-throttling-auf-200-mhz/td-p/4042154

Uatschitchun avatar Sep 02 '18 07:09 Uatschitchun

@erpalma you are right 0x614 should not fixing the GPU throttle but i am curious. However as far as I understand the CPU could send a signal to the GPU for throttle like PROCHOT. @Uatschitchun Lenovo may be aware but it looks like they do not care.

When I did a quick search on google with "mx150 thermal throttle" or similar I found a lot of posts about people having this issue. Their solutions or hints on the issue are very different:

  • Power supply related: One was writing he had a power supply which did not deliver enough power when used with the docking (or to less power in general). Lenovo is selling power supplies with less and more watts. You can figure this out by playing/testing on battery on high performance settings. If this works your power supply might be the issue.
  • strange BIOS: One was writing he fixed it by disabling Intel speed step in the BIOS. This is quite a bad workaround but if its working.
  • other software: One was writing uninstalling HP Coolsense (HP laptops) was fixing the issue.

DEvil0000 avatar Sep 03 '18 08:09 DEvil0000

@Uatschitchun I don't know if that applies - I have not much clue about nvidia in linux. Did you try something like nvidia-smi -q -d PERFORMANCE to see GPU throttle reasons. nvidia-smi

DEvil0000 avatar Sep 03 '18 09:09 DEvil0000

I would give a look at the nVidia cool bits feature. You should be able to disable performance levels completely.

erpalma avatar Sep 03 '18 11:09 erpalma

Tried both! During my tests, when dGPU reaches its fall-off temp, there's no reason given for throttling! All in all the informations which are possible to get from the MX150 with nvidia-smi are very rare. Supported clocks doesn't work and I haven't had success, until now, to get a xorg.conf running enabling coolbits as of the dual card setup it needs ;(

I'll upload some screenshots later from tests/benchmarks I've done.

Uatschitchun avatar Sep 07 '18 12:09 Uatschitchun

Here are phoronix results (as with higher resolutions, too high temps set in conf result in fall-off, which is clearly seen in results): www.openbenchmarking.org/result/1809079-AR-75GRAD88872

Here are screenshots of running stress, prime & glxsphere. https://www.dropbox.com/sh/s44z07yuhhivi3j/AAD47959bjdDpzplr97u-gXNa?dl=0

Uatschitchun avatar Sep 07 '18 14:09 Uatschitchun

Hello all, I'm having a similar issue with my T480. The nvidia-smi command didn't show anything odd, but then I tried running it once every second and then it appears that the driver is switching one of the conditions on and off rapidly (SW Power Cap : Active).

My guess is that there's nothing we can do unless NVIDIA releases some documentation or Lenovo updates the BIOS to remove the limitation. I've tried with a different (supposedly 87W) adapter and the results are the same, so my guess is that the power adapter is not the problem (or at least not the only problem)

nariox avatar Oct 03 '18 15:10 nariox

Hmm interesting. Can you please try this: while true; do nvidia-smi -pl 25; sleep 1; done

And see if you get better results?

edit: this is ugly I know..

erpalma avatar Oct 04 '18 06:10 erpalma

while true; do nvidia-smi -pl 25; sleep 1; done

25W is not that much for a GPU

DEvil0000 avatar Oct 04 '18 07:10 DEvil0000

It's the nominal TDP of the MX150

erpalma avatar Oct 04 '18 08:10 erpalma

Unfortunately I get Changing power management limit is not supported for GPU: 00000000:01:00.0.

nariox avatar Oct 04 '18 19:10 nariox

You need to set the coolbits with nvidia-xconfig first.

erpalma avatar Oct 04 '18 21:10 erpalma

I'm using the right cool bits (please correct me if I'm wrong). On optirun/primus, I have set up the cool bits on /etc/bumblebee/xorg.conf.nvidia, not sure if that matters, but cat /var/log/Xorg.8.log gives me: [ 409.371] (**) NVIDIA(0): Option "Coolbits" "28"

I've also tried it running under nvidia-xrun to similar results.

nariox avatar Oct 05 '18 01:10 nariox

Hmm too bad. I don't have access to a T480/580 with discrete GPU so I can't be more helpful sorry :/

erpalma avatar Oct 05 '18 10:10 erpalma

That's alright. Unfortunately, the nvidia-smi tool on Linux doesn't support undervolting (as far as I understand), so we are stuck. My best guess would be to flash a custom vBIOS, but that doesn't guarantee the EC is not the one power limiting the GPU.

nariox avatar Oct 10 '18 13:10 nariox

Seems like BIOS 1.17 changed the behavior, now the dGPU and CPU seem to be limited in power so that the dGPU never reaches thermal throttling (because it is always being throttled a little). Running furmark seems to reduce the maximum power on the CPU to about 5W, but closing it seems to return the CPU to full power. My guess is that the dGPU now takes priority over the power being drawn.

Since it seems like the BIOS/EC are the ones doing the management and there is no Nvidia tool to manage the power of the GPU, I think it would be alright to close this issue, because there's not much lenovo-throttling-fix can do about it.

nariox avatar Oct 17 '18 18:10 nariox

Sadly there's only 1.16 Update for T580 stating nothing in regards to dGPU/CPU ;(

Uatschitchun avatar Oct 23 '18 08:10 Uatschitchun