IronOS icon indicating copy to clipboard operation
IronOS copied to clipboard

Improved temperature estimation

Open dhiltonp opened this issue 5 years ago • 44 comments

We now have a record of watts put into the system.

Using a history of watt output and temperature, we should be able to very accurately report the raw temperature without lag...

dhiltonp avatar Sep 13 '18 22:09 dhiltonp

The effective tip temperature is always lower than the sensor.

The more power going into the tip, the worse the sag. We should be able to counter this effect...

dhiltonp avatar Oct 29 '18 04:10 dhiltonp

We should be able to.

Also helpful to know that the PID code is always effectively running 1 sample delayed as well. (But in lock step).

At the end of a reading it starts the PID code, which runs and sets a new output, then at the end of the next PWM period, the new value is copied into the output PWM and the PID is triggered again. This results in the PID code running while the PWM is doing the output for the last calculated value. So there is a known constant delay in the control loop as well.

Ralim avatar Oct 29 '18 10:10 Ralim

I'm seeing some oscillation in the PID controller, I think it's related to the PWM change from 100 to 256 - basically, our temp sample can be further from the power output than before, so the energy has more time to dissipate towards the tip from the base.

Adjusting the tip tip temp based on power and PWM should help.

This is basically adding a feed-forward aspect to the controller.

dhiltonp avatar Oct 30 '18 04:10 dhiltonp

Temperature latency should not have changed dramatically (I multiplied the counting speed by 2.55x to compensate for more counts).

But i 100% agree that a feed forward is probably a really good idea.

I could look into increasing the PWM frequency to improve the PID update rate if you wanted ?

Ralim avatar Oct 30 '18 04:10 Ralim

Huh. That's interesting.

I had to retune the PID controller afterwards - I increased the sample history by 50% and had to change the P damping (mass divisor) from 4 to ~20, and it's still not as good as before..

This temp estimation should help with that, but now I'm confused - I'm not sure what else changed. I'll have to double-check the current version to make sure I didn't mess up my math during the rewrite.

dhiltonp avatar Oct 30 '18 04:10 dhiltonp

Let's hold off on changing the PWM frequency until we see the feed-forward results.

dhiltonp avatar Oct 30 '18 04:10 dhiltonp

Yeah, just let me know, Ill try and have a look at timing on my units when I get a break to check that I didnt mess up any maths.

Ralim avatar Oct 30 '18 04:10 Ralim

It was probably the removal of any temp filtering that did it. I turned it way down and didn't notice anything so I took it all out.

I'll verify this tonight.

dhiltonp avatar Oct 30 '18 13:10 dhiltonp

Shoot, it's not the removal of filtering.

dhiltonp avatar Oct 31 '18 00:10 dhiltonp

Ok. I think it may be entirely on my end.

The build from here was quite solid and stabilized at a given temp, as I recall. https://github.com/Ralim/ts100/issues/275#issuecomment-420197231

The current build seems to exhibit the same behavior.

It could easily be my tip (see: #395).

I'm going to have to order a new one.

In the mean time, could you verify that the current version performs well for you? My tip goes 10C higher than requested, then it drops in temp, without ever stabilizing.

I could alter the algorithm to work with my tip (and maybe I should make it more robust to bad tips), but I'd like some external data.

dhiltonp avatar Nov 02 '18 02:11 dhiltonp

Re-reading, I failed to clarify - the old firmware worked quite well originally.

Now, both the old and new versions have the same pulsing behavior for me. I believe my tip (and my abuse of it in testing thermal performance) is to blame.

dhiltonp avatar Nov 03 '18 20:11 dhiltonp

Ah okay, yeah that does sound more like a tip issue then. There are also the hakko tips for cheap that are compatible that I have used as sacrificial testers before.

Ralim avatar Nov 06 '18 07:11 Ralim

I tried to get temperature data using soldering iron thermocouples and my fluke 287, but I'm seeing a lag of about 5 seconds due to some combination of time for heat transfer and averaging in the multimeter.

Is your FG100 much more responsive or is this representative?

dhiltonp avatar Dec 21 '18 03:12 dhiltonp

It looks like the lag is not in the thermocouple or display, but is due to heat conduction:

I set the tip to D24, but didn't calibrate the tip temp in software.

Calibration (63/37 solder, 183C ~= 160C on display): https://www.youtube.com/watch?v=cmNolzz65N0

Rocketing past 183C at full temp - 7s delay on full power: https://www.youtube.com/watch?v=32vVcMoATaM

Targeting 180C on iron (~200C actual) - 6s delay, despite power throttling back: https://www.youtube.com/watch?v=-7b8_GrFg2M

I'm guessing the solder melted first when targeting 180C because the solder was slightly closer to the heating element.


It takes 6-7 seconds after the iron says a temperature is hit for the tip to be that temp. With this in mind, I think that the thermocouple setup is just fine. I wonder if it'll take that same 6-7 seconds for the sensor to know that tip temp is dropping due to added thermal mass. I'll try to determine actual responsiveness/reaction time to changes in mass tonight.

dhiltonp avatar Dec 21 '18 21:12 dhiltonp

I'd like to play with adjusting the temp compensation. Instead of saying "at xxx raw temp, the corrected temp is xxx", I'm thinking of saying "At xxx power, the temp will be under by xxx".

I think this may compensate both for tip types as well as when the iron is in use.


In this test (at 54s), when the thermocouple is not touching the heat sink, the power output to maintain the iron at 300C is ~4W. The external thermocouple is in the same ballpark.

But around 40s, the power output to maintain 300C is 17W - and that's just maintaining the tip temp at the internal thermocouple. The external thermocouple temp is much lower - around 200C.


lookupTipDefaultCalValue has linear compensation based on the data provided by @Repled. What value should I use for 0 compensation?

dhiltonp avatar Dec 23 '18 03:12 dhiltonp

lookupTipDefaultCalValue is the gain of a y=mx+b line of best fit. (aka the m) b is the offset that is calibrated per unit as this is ADC offset.

So not sure what you want by 'no compensation'

Since normally : tip_Temp = HandleTemp_degC+((raw_tip-ADC_Offset)*gain)

FG100 is more responsive but not massively sadly. Best option is to have the thermocouple welded to the tip which is hard to do. I did do some testing by basically getting the thermocouple somewhat soldered on and then testing at 0-150C for first control loop trials

Ralim avatar Dec 26 '18 21:12 Ralim

I believe there used to be an m based on how the thermocouple should react, not based on real-world testing - I'm not sure how to calculate that is all.

dhiltonp avatar Dec 26 '18 23:12 dhiltonp

Thermocouples are mostly linear in the range that we are using them in. There used to be one back in firmare 1.x but in 2.x I went for measured values. The m that is there should just be modelling the thermocouple gain which is not compensated for by tip type per se, but more that different tips have different junction styles (how they terminated the wiring) which can lead to difference responses.

Ralim avatar Dec 27 '18 03:12 Ralim

Huh, ok.

I'll keep that in mind, for now I'll try to estimate tip temp given power history.

dhiltonp avatar Dec 27 '18 05:12 dhiltonp

With reference to the discussion about automatic PID tuning in #444 I would like to make some suggestions that would be helpful in solving this issue.

I would ideally like to be able to model the thermal aspects of the interactions of the components in the energy transfer chain (voltage -> power -> heating-element-temp -> thermal-conductivity -> tip-metal-center-temp -> tip-thermal-conductivity -> tip-metal-end-temp), as that would allow me to be able to perform simulations, using various compensation techniques, variables (input voltage, thermal mass of tip etc) and apply tolerances.

Do you have any "real world" measurement data, from which to derive the model? Ideally, as raw measurements of time->temp of a heat-up cycle:

  1. Apply full power to heating element (100% PWM duty cycle)
  2. When the tip temperature reaches a "typical" target temp: switch off power to heating element (0% PWM duty cycle)
  3. Until temperature approaches (half way) ambient temperature

It would be great to have both thermal-element temperature (derived from the electric resistance of the element via amplifier and ADC), as well as the actual (near tip-end) temperature, which could be obtained using a simple Type-K thermocouple, but just having temperature data from the internal measurement (heating element) of the above cycle, would be of great help, as I can see that the time lag (around 5 seconds) is documented by observation.

Along with the above data, information about the tip (type/name and its mass) as well as input voltage would be required. Additional data to allow modeling changes to tips and input voltage: The mass of the currently available tips. And finally, to save me from reverse engineering the relationship between heating element temperature and its electrical resistance, it would be good to have to be able to calculate the voltage -> power function at a given temperature.

Do any of you have the above mentioned raw data available, or would you be able to (re)produce it?

I do have some practical experience in solving similar situations, although mostly they are the other way around (i.e. the measured temperature lagging behind the actual temperature). In must cases, I have been able to derive an algorithm that was able to achieve a fairly close match to the actual temperature, at both rising, stable and falling temperatures. The algorithm has been based on the measured temperature, as well as knowledge of historic applied power as well as the known thermal conductivity and thermal mass of the thermal system.

In this case, we would need to know the power that has been applied for a period of at least 5 seconds (time lag), to be able to estimate how much heat energy has been deposited, and is "on its way" to the tip. One suitable recording of past power would be a record of added thermal energy (joules = time_seconds * voltage^2 / ohms), measured at half second intervals, and recorded in a circular buffer holding at least 10 elements (matching the observed 5 second time-lag). It may sound too complicated, but once the mathematical model is in place, simulations can be made, and a fairly simple algorithm can be derived, that provides an accurate estimate of the current tip temperature that we need to be able to provide a responsive, yet stable, PID controller that provides a predictable and stable soldering tip temperature.

A side benefit of this works would be to be able to report an accurate tip temperature to the user, rather than report an internal element temperature that is an indication of where the tip temperature will be in 5 seconds, or so.

hattesen avatar Jan 31 '19 19:01 hattesen

I don't have any real world data other than those videos (and a few others that aren't accessible). Extracting data from the iron is difficult. Instead of outputting data electronically, data could be displayed and recorded at 120 or 60fps on camera and either manually extracted or perhaps done via OCR.

The iron does maintain milliwatt output (3.5s) and temperature error history (.5s), along with rolling averages. The PID loop updates at 32hz.

PID loop: https://github.com/Ralim/ts100/blob/master/workspace/TS100/src/main.cpp#L928 Power: https://github.com/Ralim/ts100/blob/master/workspace/TS100/src/power.cpp Temp: https://github.com/Ralim/ts100/blob/master/workspace/TS100/src/hardware.c History: https://github.com/Ralim/ts100/blob/master/workspace/TS100/inc/history.hpp

dhiltonp avatar Jan 31 '19 20:01 dhiltonp

@dhiltonp, I'll see what I can get out of the data in the videos. Thanks for the details on the PID setup.

One well-documented approach to solving time delays is the integration of a "Smith Predictor" in the feedback loop. This approach is described in PID Tuning for Time-Varying Delay Systems Based on Modified Smith Predictor.

Are you able to provide some of the following data, which would assist me in creating a quick model of the heat transfer:

  1. Average applied power, when a steady state is reached (typical use tip temperature). This would be computed as PWM-duty-cycle * supply-voltage^2 / heating-element-resistance - even a rough estimate would be great.
  2. Mass (weight) of the tips - Individual or as a range.

Those, along with the videos would get me started.

hattesen avatar Feb 01 '19 01:02 hattesen

To sustain a temp around 320C is 4.5 watts.

I don't have the tip mass available.

dhiltonp avatar Feb 01 '19 03:02 dhiltonp

I have had a quick look at the code currently controlling the (heating element) temperature. It looks to me like a pure P(roportional) control algorithm with an added compensation using an average of recent applied power/energy. As far as I can see, there is no I(ntegral) or D(erivative) parts of the control, which surprises me a. Although I am unable to fully comprehend the effect of the "average recent power" compensation, I cannot see how this algorithm would ever be able to provide temperature control that is both stable and responsive at the same time – especially when subjected to varying conditions, such as tip (thermal) mass, maximum power (voltage) and a wide range of target temperatures. To guarantee stability (no oscillation), the P(roportional) gain will have to be set conservatively low, that the responsiveness is very low, resulting in slow reaction to disturbances (like cooling of tip when soldering) as well as a relatively slow approach when nearing the target temperature, when heating up.

Adding an I(ntegral) control (applied power compensation) will ensure that the measured temperature will not have a (constant) offset relative to the target temperature.

Adding a D(erivative) control (applied power compensation) will reduce/eliminate oscillation (typically when heating up), allowing the P(proportional) gain to be increased, thereby achieving a more responsive (faster reacting) controller.

Without a full PID implementation to start with, we really have no knobs to twiddle (parameters to adjust), and while we may still achieve a decent control, we would have to reinvent the wheel, when trying to cope with the consequences of known variations in the heat transfer system, and probably end up having implemented something similar to a PID (proportional–integral–derivative) controller, but without all the benefits of using an industry standard algorithm that has been controlling processes for nearly 100 years.

I would really like to assist in achieving a solid, future proof, control algorithm, that:

  • Is responsive (fast heat-up and fast reaction to disturbances)
  • Is stable (minimal oscillation and temperature overshoot)
  • Retains a good control profile when subjected to changes to system conditions, such as...
    • Target temperature
    • Thermal mass of tip
    • Maximum power available (voltage)
  • Controls the temperature of the tip-end, rather than the heating element temperature (by estimation)
  • Displays the temperature of the tip-end, rather than the heating element temperature (by estimation)

As I have stated earlier, I believe that the most efficient way of achieving the above goals is to use a standard PID control algorithm, possibly augmented to compensate for the relatively slow (approx 5 sec) heat transfer from heating element to tip-end. To implement and tune the PID controller achieving the above requirements, without having a full test setup with external monitoring of all internal process variables as well as externally measured tip-end temperature, we really need to model the system being controlled, allowing simulations with near instant feedback.

I propose this plan forward:

  1. Create a feature branch, to allow experimenting with algorithms, features and testing without disturbing the main development path.
  2. Set up a simulation model, using the currently known variables and parameters and approximate the current system behavior. Can be implemented using a spreadsheet, or tools like MatLab/Maple. I prefer to use a widely available, preferably free, tools to allow future reuse by anyone.
  3. Perform model simulations that will provide one or more algorithmic options as well as documented behavior with changes to system conditions, as well as suggested algorithm parameters, including proposed gain factors kP, kI, kD.
  4. Implement/update the algorithm described by the model using the parameters obtained by simulation
  5. Perform tests using extreme system conditions (max/min tip mass, max/min voltage, max/min target temperature), and compare behavior with results from model simulations.
  6. If simulated and real system behavior differs significantly, adjust the model to achieve a better match and perform a new simulation -> implementation -> test cycle (GOTO 3.)

@dhiltonp and @Ralim what are your views on this plan? I'm quite happy to do the majority of the work myself, but you would need to do testing, and possibly assist in setting up the tool chain (I have not yet set up the development environment).

I have just ordered a second TS100 for testing purposes, which I will hack (somehow), to be able to monitor/record internal process variables as well as actual tip temperature, using one or two thermocouples (K type TP-01 with minimal thermal mass) allowing the actual tip temperature to be measured (mid-tip and end-tip), which will allow us to improve and validate the accuracy of the tip-end-temperature estimation algorithm, and it would also allow easier testing of future controller enhancements, such as #444.

Sponsored hardware I wonder whether we will be able to obtain sponsored hardware from Miniware, or other sources, for this work. They do benefit from the availability of this project, that exclusively runs on their hardware. Have you ever tried, @Ralim?

hattesen avatar Feb 01 '19 12:02 hattesen

Yeah, we disabled the D term as it was too sensitive to noise and disturbance. Any D term that could respond to changing temps would cause a much larger change in temp than was measured. Modeling the thermal behavior could make it usable.

The I term is defined differently than you expect, but is still an integral over a past window. Using milliWattHistory instead of a longer temperature window has 2 effects: it reduces computation, and clips the I term to be greater than 0. The clipping isn't bad as we can only heat, not cool. It means we undershoot less than we otherwise would.

I would categorize the current algorithm as a PI controller, with the thermal model built in to the code instead of pre-calculated. It could be rewritten in a more traditional form and you are welcome to do that - or not!

The PID algorithm from August may be more in line with what you expect, though that has its own issues.

I love the goals you have listed. I'm not too worried about the end implementation, so long as the implementation is clear and provides measurable benefits :)

With respect to workflow, just create a fork - no branch on @Ralim's repo is necessary. I'd love to see the model you come up with!

dhiltonp avatar Feb 01 '19 14:02 dhiltonp

Hi guys, being a control guy myself, I have found a bit difficult the PID code to read, mainly because the implementation seems to be quite "unconventional". Is there an anti-wind up scheme implemented (did not find it)? This is crucial when the actuation variable gets saturated.

ldecicco avatar Feb 01 '19 22:02 ldecicco

The milliwatt output is clipped at 0 when temp is dropping, preventing negative wind up, and there is positive wind up but that's mostly ok - some overshoot is desirable because of the thermal lag.

dhiltonp avatar Feb 02 '19 00:02 dhiltonp

I honestly believe, that a conventional PID control with anti-wind-up on the I term would be quite adequate for this soldering iron. The control is no more complex that what PID controllers are used for in millions of installations during the past century.

Yeah, we disabled the D term as it was too sensitive to noise and disturbance. Any D term that could respond to changing temps would cause a much larger change in temp than was measured. Modeling the thermal behavior could make it usable.

My guess is that the D (rise rate) term is noisy due to a combination of the ADC LSB noise, and using a very short delta-time for the measurement. Measuring D rate once a second would be sufficient for it to be useful. Alternatively, one would use a time-weighted average when measuring D.

The current average recently applied power" compensation would have a similar effect to the D term, but instead of measuring temperature rise rate directly, it measures applied power, which would be somewhat proportional. The noise figure would obviously also be enormous for the measured power, unless averaged over a longer term, like it is done currently. I feel it would be better to use the actual measured temperature rise rate rather than to rely on an indirectly tied parameter. An easy way to avoid I term wind-up, is to keep it at zero, as long as the output is saturated (100% power). That way, once the P and D terms on their own start to reduce the power, the long term temperature offset starts to be summed up (in I). It is a lot less compute intensive to use the traditional I term than the current rolling-average-of-recent-power. It requires adding a temperature delta (instability is not a problem) to the integral (I) variable.

No need to do any other special "clipping" or parameter management, other than that.

Until it is proven, that the regular PID cannot be tuned sufficiently responsive and stable. At that point, we should find out the root cause.

The only caveat in this whole equation is the reason this issue was created in the first place. The measured temperature is not equal to the temperature of the soldering tip. The actual tip-end temperature (which is the one that should ideally be regulated) lags behind, about 5 seconds, changes to the heating element core, that is used for temperature measurements. However, until a solid, responsive and stable controller algorithm has been achieved, it is futile to think, that we can tweak it to take the thermal lag into account.

hattesen avatar Feb 02 '19 15:02 hattesen

I honestly believe, that a conventional PID control with anti-wind-up on the I term would be quite adequate for this soldering iron. The control is no more complex that what PID controllers are used for in millions of installations during the past century.

+1

Yeah, we disabled the D term as it was too sensitive to noise and disturbance. Any D term that could respond to changing temps would cause a much larger change in temp than was measured. Modeling the thermal behavior could make it usable.

My guess is that the D (rise rate) term is noisy due to a combination of the ADC LSB noise, and using a very short delta-time for the measurement. Measuring D rate once a second would be sufficient for it to be useful. Alternatively, one would use a time-weighted average when measuring D.

The current average recently applied power" compensation would have a similar effect to the D term, but instead of measuring temperature rise rate directly, it measures applied power, which would be somewhat proportional. The noise figure would obviously also be enormous for the measured power, unless averaged over a longer term, like it is done currently. I feel it would be better to use the actual measured temperature rise rate rather than to rely on an indirectly tied parameter.

+1 also on this.

An easy way to avoid I term wind-up, is to keep it at zero, as long as the output is saturated (100% power). That way, once the P and D terms on their own start to reduce the power, the long term temperature offset starts to be summed up (in I). It is a lot less compute intensive to use the traditional I term than the current rolling-average-of-recent-power. It requires adding a temperature delta (instability is not a problem) to the integral (I) variable.

This is actually not so clean (it's not the proper way to implement anti-wind up). There are several schemes, the simplest one is conditional integral meaning that you stop integrating (but you do not reset it to zero) when the output is saturated. Among others, the back-calculation is the easiest to implement and elegant but it adds another gain (K_aw) to tune (even though rule of thumb do exist once you have a decent tuning of the PID gains).

No need to do any other special "clipping" or parameter management, other than that.

Until it is proven, that the regular PID cannot be tuned sufficiently responsive and stable. At that point, we should find out the root cause.

The only caveat in this whole equation is the reason this issue was created in the first place. The measured temperature is not equal to the temperature of the soldering tip. The actual tip-end temperature (which is the one that should ideally be regulated) lags behind, about 5 seconds, changes to the heating element core, that is used for temperature measurements. However, until a solid, responsive and stable controller algorithm has been achieved, it is futile to think, that we can tweak it to take the thermal lag into account.

There are more sophisticated mathematical tools to take into consideration this issue, but you need to identify the model of the soldering tip system.

ldecicco avatar Feb 02 '19 16:02 ldecicco

It's great that so many people with control experience are chiming in!

I'd like to make sure we're on the same page.

The algorithm in the code you've been looking at is newly released in 2.06.

@ldecicco, @hattesen, @doegox - can you confirm that you have tested the new algorithm in 2.06? While it may not be what you are used to looking at, it is a pretty solid PI controller and regulates the heating element temperature quite well (though I am a little biased, having designed/implemented it). I am open to any improvements you all have - including fully replacing the algorithm.

The discovery of thermal lag and realizing that the element temp can be very far from tip temp came after this new implementation.

If the tip is touching a heat sink, the tip temp can consistently be 100C lower than the heating element temp forever. The 6-7s lag is a simplification as the tip temp asymptotically approaches some value given a fixed power output. Actual testing of that curve should probably be done - I don't recall the values I observed well enough, and measurements perturb the output...

Again, the D term is useless for now. If you take action on the D term temp goes crazy and not just due to noise. The thermocouple can very quickly heat up 30-40C (I think, this is from memory) when very little energy has been delivered to the tip. Then algorithm cuts power because of the overshoot, and suddenly the temp is back nearly where it was before the D term kicked in. That drop of course would also trigger the D term but:

@hattesen, I suspect you are right about sampling frequency. Currently we are sampling the tip temperature very frequently (in the khz range? - @Ralim, can you confirm?) and averaging. It may be better to keep a similar on/off ratio (80%/20%), but sample every 1/2 second giving the sensor .1 second to stabilize after driving the tip. That may make the D term usable.


One note on the I term accumulation - the rolling average is actually very lightweight to compute (maintain a sum, subtract oldest value, add newest value). One thing to worry about (maybe you have an easy solution) in the proposed sum solution is saturation of the I variable. I'm not sure if 64 bit ints are available on the hardware and am unable to check - again, @Ralim, can you verify int64_t is available?

dhiltonp avatar Feb 03 '19 03:02 dhiltonp