gbs-control icon indicating copy to clipboard operation
gbs-control copied to clipboard

Horizontal tear line even with clock generator installed

Open M-Reimer opened this issue 2 years ago • 13 comments

Especially when playing side scrolling games (Super Mario World in my case) after some play time suddenly a tear line in the lower quarter of the screen appears. This only is visible when actually walking sideways but it is very noticeable and annoying in this case. It disappears once I no longer move.

I have the clock generator installed and the "Sync on Luma" from the SNES is filtered using an LM1881 sync stripper.

If I go to developer options and press "HTotal++" once, then the tear line seems to disappear but ocassionally reappears as a tear line which moves from about the center of the screen to the bottom.

Is this a known problem and is there any reliable fix for this?

M-Reimer avatar Oct 24 '21 12:10 M-Reimer

It may be possible that this only appears with 720p output resolution. I've played some time with 1080p and, so far, no issues. I'll report back after a longer gaming session.

M-Reimer avatar Oct 25 '21 12:10 M-Reimer

So the issue here is that the clock generator clock has to be enabled (switched active in the ASIC) at precisely the time when the tear bar is not visible. Right now, there is no such code at all, leaving it to chance when the clock is being switched in.

What's required is finding a sync point that can be used, and then coding that in.

The way it works now is simply that I found that most of the time, the clock switch happens when the tear bar is out of screen. You can retry (reboot the scaler) if it isn't. Of course, a proper method would be better!

ramapcsx2 avatar Oct 25 '21 14:10 ramapcsx2

I've tried some time with 1080p, now and the same problem exists there, too.

The problematic thing for me is that it always works just well some time but after some gaming the tear line suddenly appears. It seems to disappear at first if I do "HTotal++" but then I get a tear line which moves from the top of the screen to the bottom. The moving tear line also does no longer disappear or freeze in position if I counteract my HTotal adjustment with "HTotal--". Feels just like if the clock generator gets disabled at the first usage of the "HTotal" buttons.

Rebooting the scaler may work, but, as mentioned above, it always starts without the tear line and after some time the tear line appears at a locked position on the screen. So it would mean that I had to constantly reboot the scaler on a longer gaming session.

I wonder if someone ever was able to play titles like "Super Mario World" without issues or if this is just my hardware that has problems...

M-Reimer avatar Oct 25 '21 20:10 M-Reimer

It always has a chance to reappear when any kind of sync glitch/disturbance happens. A proper locking mechanism would help fix this as well.

It's all about finding a way to align the new display clock with the tear line.

ramapcsx2 avatar Oct 25 '21 20:10 ramapcsx2

Is this possible for an external microcontroller at all? The video processing happens inside the Trueview 5725 IC, right?

So I guess the only solution for now is that I try if I can get used to this...

Edit: You did add some workarounds to fix the tear line if no clock generator is installed. So may your solutions for this actually be less distracting than the actual clock gen? Is it possible to test this out without actually removing the clock generator?

M-Reimer avatar Oct 25 '21 20:10 M-Reimer

You can desolder an I2C wire (SCL or SDA) from the clock gen board, which effectively removes it from the bus. The workaround methods might work for you, but they come with their own drawbacks (mainly whether the display used accepts their hackery :) ).

ramapcsx2 avatar Oct 25 '21 21:10 ramapcsx2

Just wanted to say I have this issue as well on a 480p RGBHV source (Dreamcast VGA) at 1080p. Sounds like this is a known issue with any input but figured I would sub to this issue in case anyone figures something out. The explanation for why this happens makes sense, I wish I understood the technology well enough to help discover a solution!

I may disable my clockgen since I found the "old screen tearing" that I installed the clockgen to fix was a little less distracting for my application. @ramapcsx2 does the clockgen improve video signal quality in any other meaningful way? I swear the video quality got better overall with the clockgen but I wondered if that was just a placebo.

luciusbono avatar May 14 '22 23:05 luciusbono

I'm having a horizontal tear appear near the bottom of the screen at times, reloading the preset clears it but it can return. I wondered if this was because my clock board is positioned just behind the component connections so the cable for the clock is quite long compared to other installations I've seen, but after reading this am I right to think re-installing it closer won't help at all? What might happen if the clock cable is too long?

I tried disconnecting the clock board but then I get frequent tears appear that move from top to bottom which I think is even more distracting. I wonder if anything is being worked on currently to help with this; anything I can test?

jsteel44 avatar May 26 '22 14:05 jsteel44

No, there's no known way to force a better result. The clock gen method worked for me, with the sources I could test. But if there's any variation, and the tear line always lands in the visible picture for you, then that's the algorithm failing. I could never find a reliably sync source to directly control the tear line position. This is what's needed, but I don't know where to get it :p

ramapcsx2 avatar May 26 '22 21:05 ramapcsx2

It doesn't always land there, but when it appears it is (at least approximately) in the same position. Is there anything I can adjust in the algorithm that would move the line down? Usually when I turn on, I don't see a line, so I appreciate that moving it down might mean it's usually visible (top of screen?) but at least it would give me something to play around with. Thanks for any help/suggestions.

jsteel44 avatar May 27 '22 07:05 jsteel44

Would it be actually possible to do a "tear line reset" in code? Meaning: Disable the external sync and reenable it based on the existing logic? I mean in most cases it actually is out of view in this case. If that moves the tear line back out of view, then all we need is a trigger to automatically trigger this "reset".

And if this is possible I wonder if it is possible to detect such glitches based on timing. Just let's assume that the signal provider (in my case a SNES) in some situations triggers a sync pulse too early or too late (meaning "out of frequency"). This would describe how it is possible that the tear line is still "spot on" and not moving but now unfortunately in the visible area.

If all this sounds not too unreasonable, I could program an Arduino to monitor the sync line just to see if something unusual happens there as soon as the tear line gets visible.

M-Reimer avatar May 27 '22 09:05 M-Reimer

Bit of technical background: The scaler runs on a 1 field (all lines of a picture) buffer, with new lines added from the upscale process drawing into this field. If the point where new lines draw over old lines is within the visible output, then it shows as the tear line. The output line rate is matched up with the input line rate precisely, using the clock generator. This means, if everything "works well", the tear line will not move around, but stay in place in the picture.

So this means that the frequencies are all fine, the part of the code that syncs them works, but it is missing one information, or sync point: It doesn't know when to start locking the 2, which will determine where the tear line lands. I don't know what could be used to sync that up. Edit: Ah yeah, and there is a point just outside the visible picture, that's large enough to have the tear line sit there all the time, and never be visible. This is how it works for the sources I could test.. most of the time.. And if it wanders off, a preset reload would fix it.

ramapcsx2 avatar May 27 '22 12:05 ramapcsx2

Ok, I may be dumb as nuts but: can we perhaps make the ESP detect the signal first through the GBS, then interrupt GBS operation(cut it off at this moment completely from processing) while at the same time only running clockgen and immediately reinitialize GBS at the right time to sync up?

So we turn on the unit in full, wait for the signal, disable main board operation except for the signals coming from the clockgen (I suppose GBS Control firmware knows how to distinguish them based on the fact that it recognizes clockgen operation on its own), then use the ESP to wait for exactly 1 frame (with accomodation for input delay and format of the input, I have no idea how long it takes for the ESP to send signals to the GBS, obviously) and at that moment activate the rest of the processing.

I apologize if it doesn't make sense, I can see how if the GBS cannot operate in this "suspended" mode it's all just a pipe dream. Maybe if that's the case we could solder the clockgen to the ESP and then use the above technique to sync them up?

I'm a layman for the most part but I wanted to at least try to be useful :)

JaCkBoston avatar May 27 '22 17:05 JaCkBoston

Inconsistent initial latency, video latency drifts (usually increases) over time, and occasional zigzag tearing when latency overflows 1 frame

I have a similar problem with my GBS-Control with clock generator installed, set up to convert Wii (usually-)480p component to VGA 480p output.

During multi-hour gaming sessions, I would occasionally notice a zigzag tearing line making its way up or down the screen. These were rare and difficult to photograph, though I have a bad picture of me selecting a Wii channel (on the top row and middle-left column) and the channel growing:

PXL_20230122_034305811-01

I remember the tear line was moving up(?) the screen, and the screenshot shows a newer frame above the tear line and an older one below. If I understand correctly, that would mean the tear line is gradually replacing a 0 ms old image with a 16.7 ms old image. However in my testing. I found the output frame rate to be lower than the actual one, resulting in frames taking longer to be displayed than received, causing the video latency to grow until reaching 16.7 ms, before a tear line reveals a 0 ms old image. I don't know what's happening, but both behaviors are bugs.

Oddly this image shows what looks like alternating lines of light and dark, as if the 480p input signal was being overwritten while it was being analyzed in the deinterlacer, or alternatively a race condition between overwriting scanlines and outputting them. If I had a better picture, I may be able to speculate further on the exact behavior of this tearing.

Testing methodology

I hooked up my Wii's component luma cable to a passive RCA Y-splitter, and plugged one output to the GBS-C and the other to the left channel of my PC motherboard's audio "line in". I then taped a solar panel (near-zero latency) to the top of my CRT (near-zero latency) plugged into my GBS-C's output, and plugged its output through a DC-blocking capacitor to the right channel of my PC motherboard's "line in". At this point, I calibrated the stereo input volumes independently to prevent clipping, then performed synchronized recordings of the GBS-C's luma input and output, at 192 KHz using Pipewire and Audacity.

Results

Data at gbs-c latency tearing recordings.zip.

Result filenames containing stock originate from stock firmware 96e67732af167fe25966bace8557bd19cf562300, and other filenames were recorded earlier on my refactored firmware. The bug is present on both firmwares and I didn't notice any differences in behavior, so I'm guessing the bug is already present upstream and behaves identically in my fork.

Replugging the GBS-C semi-randomizes latency

When I unplugged and plugged the GBS-C's power connector, the latency from input to output video varied between 3.1 scanlines (0.1 ms) to 4ms in my limited testing. I find this variable latency initial rather undesirable personally, but it's tolerable if the latency converges to a fixed value over time (it doesn't).

Data at 480p startup latency stock.aup3.

Screenshot_20230122_185539

Latency drifts upwards over time

Every time I click "480p 576p", the debug log displays a line like source Hz: 59.93892 new out: 59.93892 clock: 80987560 (-12440). The exact frequency estimated varies, and new out is sometimes but not always identical to source Hz.

With an output frequency of 59.93892 Hz (480p drift.aup3), I observed that latency grew from 3.9 ms to 8.3 ms (both approximate), over the course of 100 minutes. This equates to 44 microseconds per minute of frequency drift, or 0.000 000 733... (0.733 parts per million error). This is a minuscule error, but since it goes uncorrected, it accumulates over multi-hour play sessions to entire frames of error, at which point a tear line appears.

Data at:

  • 480p drift.aup3 (picture)
    • source Hz: 59.93892 new out: 59.93892 clock: 80987624 (-12376)
    • Good: long-term test, sampling CRT screen every 10 minutes over 100 minutes of waiting, and latency growing from 3.9ms to 8.3ms (so good quality data)
    • Bad: based on my modded firmware; in theory this data could be not representative of stock firmware, but I observed that both behave similarly.
  • 480p drift stock.aup3
    • Less comprehensive test, only a few minutes of waiting, but was performed with stock firmware, and latency consistently drifted upwards with extremely low noise over the course of minutes.

Screenshot_20230122_185623

Selecting output resolution increases output latency, modulo 1 frame

Each click to "480p 576p" (the currently selected resolution preset) would increase latency by around 47 samples = 47/192000 s = 0.0147 frame.

By repeatedly clicking this button, I was able to rapidly raise the input latency up to over 16 ms (on stock firmware!), where each scanline of an input frame was being output merely 10 or so scanlines before the corresponding scanline of the next input frame was entering!

After pressing this button again, the delay had wrapped around past 1 full frame, and each input scanline was being output a mere 3 scanlines later (around 100 microseconds, for a sub-millisecond latency)! Unfortunately I was not able to take a photograph of the tearing which would've occurred, had I waited for latency to drift upwards and overflow on its own.

Data at 480p click output resolution.aup3 and 480p drift stock.aup3 (picture).

Screenshot_20230122_185755

Does the bug occur on stock firmware?

I verified that latency would drift indefinitely to the point of overflowing on my firmware fork, and verified it would occur using resolution changes as a shortcut. I did not play a game for hours on stock firmware, watching the latency increase until it overflowed. But I'm guessing the existence of this bug report is evidence that latency overflow and tearing happens on stock firmware in the wild.

Pass Through has sub-scanline latency, and doesn't drift(? unverified)

Selecting "Pass Through" as a fixed resolution preset is a good workaround for this issue. It has under 1 scanline of phase difference between input and output signals (15-25 microseconds, effectively zero), and zero frames of response time between an input signal change and output change, meaning no input latency relative to a 480p-compatible CRT.

When selecting "Pass Through", I do not see a line like source Hz: 59.93897 new out: 59.93901 clock: 80987680 (-12320) in the debug log. My guess is there is no clock generator tuning being used to generate the output video signal(?), and you're configuring the TV5725 in "Sync lock mode"(?), so no way to drift out of sync with the input video signal. I did not monitor over a long period of time to verify there was no latency drift though.

Data at passthrough.aup3.

The downside is, with a VGA CRT supporting only 31 KHz and higher signals, you cannot feed in 240p or 480i signals into the GBS-C and get an image on the monitor. Most Wii games do not suffer from this problem, as you can set the Wii to 480p, and for GameCube games, force progressive scan in Nintendont (or click "Yes" in each game's 480i menu asking to reenable progressive scan). Unfortunately this is not a viable solution for playing SNES, N64, and other consoles which only output 240p/480i signals.

Fixing the bug: Ensuring consistent startup latency and compensating for drift

You'd have to automatically adjust the frequency based on the phase difference between the input and output signals. I'm not sure what TV5725 registers or ESP GPIO/etc. pins to read, to determine them. Looking at the existing code:

void externalClockGenSyncInOutRate()
{
	...
	float sfr = getSourceFieldRate(0);
	[float getSourceFieldRate(boolean useSPBus)]
	{
		...
		uint32_t fieldTimeTicks = FrameSync::getPulseTicks();
	}
	...

	float ofr = getOutputFrameRate();
	[float getOutputFrameRate()]
	{
		...
		uint32_t fieldTimeTicks = FrameSync::getPulseTicks();
	}
	...
	loop Si.setFreq(0, current);
	...
	SerialM.print(... "new out" ...)
}

I still don't know how the functions which measure source and output frame rate, both use the same FrameSync::getPulseTicks() function, nor do I understand how that function uses interrupts.

For initial clock sync, should we block until a vsync or cancellation? What about for steady-state clock adjustment to maintain vblank phase lock? Is GBS-Control single-threaded? Programmed using synchronous/blocking or async/non-blocking event-driven code?

The current externalClockGenSyncInOutRate code for synchronizing frequencies is already quite complex and involves frequency ratios. I'm not sure where externalClockGenSyncInOutRate gets called, but considering I only see new out printed when switching input or output types, it can't be called during steady-state operation.

I don't know how to revise the code to ensure consistent startup latency/phase, nor how to nudge the frequency generator to keep the latency near a target. Should startup phase lock be added to externalClockGenSyncInOutRate or another (blocking?) function? Should nudging the frequency to ensure consistent frame phase be added to externalClockGenSyncInOutRate and call it more frequently, or in a separate function? Should we add temporary frequency nudges in parallel with the existing "center frequency" adjustment code, or control the frequency offset (biased towards zero and added to the initial frequency estimate) with a PLL, PipeWire's DLL (unsure if it matches the conventional meaning of a delay-locked loop), or PID controller, or control the absolute frequency outright with a PLL/DLL/PID (perhaps with initial value set by externalClockGenSyncInOutRate)?

Ideally the continuous frequency controller code should be written to avoid blocking the main loop (which would prevent the GBS-C from responding to commands quickly), much like the sync watcher. Is this achievable? I don't know; FrameSync::getPulseTicks() is blocking.

Additionally I'm concerned that a poorly implemented frequency control loop may become unstable, and cause the frequency generator's frequency to oscillate out of control. Additionally you probably need to clamp the maximum frequency deviation near the input frequency and 60/50 Hz (or alternatively, if large deviations are detected, shut off the output signal altogether to avoid frying monitors, and print a debugging message so you know when it's malfunctioning), and clamp the rate of frequency change to a low value, to avoid breaking sync. Unfortunately these nonlinear clamping mechanisms may destabilize an otherwise-stable control algorithm (see https://www.google.com/search?q=pilot-induced+oscillation+rate+limiting), so it may be worthwhile to pick a more conservative control algorithm which adjusts latency slower or produces a steady-state latency offset when the input or clock generator frequency drifts, but minimizes frequency offset, rate of change, and frequency oscillations. But take everything I say with a heaping of salt, since I'm by no means a control theory expert, and never really mastered the pole-zero plots and matrices and eigenvalues needed to mathematically model control loops.

static uint32_t getPulseTicks()
{
	uint32_t inStart, inStop;
	vsyncInputSample(&inStart, &inStop)
	// Sample vsync start and stop times from debug pin.
	[static bool vsyncInputSample(uint32_t *start, uint32_t *stop)]
	{
		startTime = 0;
		stopTime = 0;
		armed = 0;
		yield();
		ESP.wdtDisable();

		attachInterrupt(DEBUG_IN_PIN, risingEdgeISR_prepare, RISING);
		/// ...loop delay(7) until stopTime > 0
		*start = startTime;
		*stop = stopTime;
		ESP.wdtEnable(0);
		WiFi.setSleepMode(WIFI_NONE_SLEEP);

		if ((*start >= *stop) || *stop == 0 || *start == 0) {
			// ESP.getCycleCount() overflow oder no pulse, just fail this round
			return false;
		}

		return true;
	}
	if (!^) {
		return 0;
	}
	return inStop - inStart;
}

I'm guessing that detecting vsync has something to do with the debug pin then?

Note that on my GBS-C, I broke the PCB trace to the debug pin and had to bodge it using super-thin magnet wire. However, I do think that vsync detection is still working, since the GBS-C web UI logs show different frame rates when converting Wii 480p (59.9 Hz, 525 lines), Wii 240p Test Suite 240p (59.8 Hz, 263 lines), and SNES 240p (60.1 Hz, 262 lines) to 480p output.

nyanpasu64 avatar Jan 23 '23 03:01 nyanpasu64

(Draft post, not 100% polished) EDIT: I'm thinking the minimum steady-state latency is probably near-zero for 480p input and possibly 240p input (all you need is the current and next line to interpolate between), and likely higher for 480i deinterlacing if deinterlacing decisions for one frame are affected by later scanlines for the corresponding input frame (I'm not sure if this is the case outside of adjacent-line blending).


I'm thinking that instead of a generic PID controller, you could use a nonlinear controller based on knowledge of the system's state and our nonlinear rate limiting, which smoothly increases and decreases the external clock generator rate to maintain a specific output latency behind the video input. State variables could include:

  • Last register write fed into Si.setFreq(), updates immediately, is initialized with the default values
  • Latency, updates moderately fast, is initialized with the expected startup latency
  • Detected input frequency? updates moderately fast, initialized with the current "estimate input frequency" function
    • 240p and 480i have different vblank rates, but the output vblank rate is fixed, so you need to actively compensate. ~~The current GBS-C doesn't! I could use 240p/480i transitions to vastly speed up latency overflow...~~ The sync watcher currently already detects changes in input frequency without resyncing output, when switching between 240p and 480i inputs.
  • Conversion factor from detected input frequency to the register write which matches the rate. this updates slowly and is initialized to the current "get register write" function.
    • Is it safe to ignore this entirely, and hard-code an appropriate conversion factor? Maybe, perhaps hard-code this first and only make this update at runtime (adding code complexity, making the behavior harder to analyze and unstable if implemented wrong) if we get unacceptable latency error.
  • State values not worth saving independently:
    • The amount where altering a register affects the detected output frequency (is this merely a reciprocal?)
    • The amount where altering a register affects the rate of latency (latency change per second) (is this merely a constant factor of detected output frequency?)

Given this state, we can derive a model:

  • input each frame:
    • detected input frequency (not smoothed)
    • detected latency (not smoothed)
  • output:
    • New register write fed into Si.setFreq()
      • Test a few values around (previous register write ± maximum frequency offset per frame), for example (no change, ±10ppm, ±30ppm, ±100ppm, ±200ppm), and pick the best using our cost function.
      • Alternatively calculate the optimal frequency which avoids overshoot, then clamp it to the maximum allowed change in frequency, and output it directly without evaluating the cost function???
  • cost function:
    • low cost if estimated latency != target
    • cost if estimated output frequency != estimated input frequency
    • cost if |register write - previous register write| > maximum allowed value (unsure, maybe 0.1% frequency shift per second or less? probably derive it through tuning)
    • cost for latency overshoot (if output is too far behind input, avoid increasing frequency to catch up, if doing so eventually results in output getting too close to input, even if we reduce frequency as fast as possible afterwards)
  • cut the output video signal, log error, and wait for monitor to reset before reinitializing sync state and outputting video, if:
    • estimated resultant output frequency is out of bounds?
    • latency exceeds [0, 1 frame] (Atari 2600 moment)...
  • state update:
    • save the new register
    • update the smoothed input frequency and latency
      • Updating latency slowly will result in delay and possibly instability. Updating more quickly produces noise, but I think the amount should be low enough to tolerate.
    • Optionally nudge the smoothed frequency-to-register conversion rate

Another more intuitive way to think of this problem is, we are trying to follow a fixed distance (fractional frames/scanlines, or latency in ms) behind a car which drives at a not-exactly-fixed speed (frames per second), in a car which we can control the speed of (perhaps with a slight multiplicative offset), but are not allowed to change the speed too quickly (limited acceleration). A real car would try to smoothly raise and lower acceleration (limit jerk) for comfortable motion, but in a video scaler adjusting latency by mere milliseconds, I don't think avoiding sudden acceleration and deceleration (low-passing the derivative of input latency) is worth the complexity.

Again I'm not a control expert, but this approach should be simpler to understand and produce arguably better results. Interestingly PipeWire did not take this approach and instead used a second-order(?) filter which overshoots the target latency once when initialized at a wildly incorrect state (https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2468).

nyanpasu64 avatar Jan 24 '23 01:01 nyanpasu64

Wow, I love seeing that level of commitment. Absolutely mad :D

So your data driven approach is perfect for tackling the issue. We have a rough understanding of what goes wrong, down to the code that does it. This is good this far, but before thinking of any (possibly complex) solutions, we need to look at the capabilities of the hardware:

Data sources:

  • the sync processor registers in the TiVo, can be polled via I2C bus, accuracy is not the best and the slow poll isn't either
  • the sync processor can be configured to output debug signals, many sources selectable, none documented. I picked some that seemed like 2nd (filtered) stage input syncs and "last stage" output syncs
  • the TiVo can get things wrong occasionally, via I2C and the debug port. Using those values will throw everything off.
  • (Edit): Yea, I remember now: The measurements are all taken on the lightly processed source VSync interval, against the fully processed output (out of the TiVo) VSync interval, all using the configurable debug IO of the TiVo

ESP8266 GPIO:

  • using the debug sync pulses, the ESP is set up to measure their timing
  • ESP itself is not very good for real time processing, but some tricks make it better, most importantly the disabling of interrupts
  • the code may already show it, but I didn't have the best confidence in it. Improvements likely possible!

Given this, the code generally could acquire the measurements better. "getSourceFieldRate" and "getOutputFrameRate" may work better if they're interleaved more, and queried more often, then averaged. One problem here is that there's not endless time to do this. Depending on the situation, we need to know the values quickly. Maybe a system with a quick initial setup, then a constant monitoring / adjust loop?

ramapcsx2 avatar Jan 24 '23 08:01 ramapcsx2

To elaborate on my previous post, I'm thinking the minimum steady-state latency is probably near-zero for 480p input and possibly 240p input (all you need is the current and next line to interpolate between), and likely higher for 480i deinterlacing if deinterlacing decisions for one frame are affected by later scanlines for the corresponding input frame (I'm not sure if this is the case outside of adjacent-line blending, perhaps not since the Programming Guide says "Motion Index Generation" uses a likely-causal "V-IIR" rather than symmetric filter).

I also think the minimum latency for games switching between 240p and 480i is higher, since the two input resolutions have different "fields per second", so the output needs room ahead of it so it won't crash into the input before the output can respond to the input slowing down.

Do you have any corrections on that?

Maintaining latency near the target value

"getSourceFieldRate" and "getOutputFrameRate" may work better if they're interleaved more, and queried more often, then averaged.

I'm not opposed to the current system of redetecting frame rates if a change is detected (eg. 240p/480i switching), if it responds fast enough to avoid output tearing at practical latencies.

One problem here is that there's not endless time to do this. Depending on the situation, we need to know the values quickly. Maybe a system with a quick initial setup, then a constant monitoring / adjust loop?

Though I may still replace it with continuously updating the frame rate based on new data, if the code is easier to understand or responds better to video clock drift in addition to scanline changes.

the sync processor can be configured to output debug signals, many sources selectable, none documented. I picked some that seemed like 2nd (filtered) stage input syncs and "last stage" output syncs

From reading the code, it seems both getSourceFieldRate and getOutputFrameRate rely on absolute timestamps of the ESP's clock(?), rsr %0,ccount, and are written in asynchronous non-blocking event-driven style hooked up to hardware interrupts.


On the algorithmic side, I was thinking we need a third function, "getOutputLatency" which first asks the video chip for input sync and waits, then records the input timestamp, asks the video chip for output sync and waits, then records the output timestamp. At this point we can subtract output sync time - input sync time (which should be around 0-1 frames) to determine the estimated video latency.

  • I'm guessing the estimated video latency may be incorrect by ±1 frame, when the total latency is very close to 0 or 1 frame (though I did not test to make sure). We can address this by taking this value modulo 1 frame, so if the actual latency is 0.01 frame, but is measured as 1.01 frame (because we missed the output vsync while reprogramming the chip to wait for it), we subtract 1 frame to correctly estimate 0.01 frames of latency, then slow down the output so it's 0.1-0.3 frames behind the input.
  • Additionally, it may be worthwhile to offset the modulo output range, to bring latency towards the target along the shortest path. Eg. if output is 0.95 frames behind input (or even 0.99+ frames behind input and tearing on-screen), you could interpret this as the output being 0.05 frames ahead of input, then drop a single frame to slow it down to 0.1 frames behind input (instead of speeding up output and waiting for latency to decrease from 0.95 to 0.1 frames, which takes longer).
    • Alternatively you may consider dropping frames to be less desirable than gradually reducing latency while keeping all input frames.
    • I suspect that hitting ≥0.95 frames of latency should rarely happen on well-behaved video sources, if we reliably initialize latency to a target value upon initial startup/sync, the target latency is not too close to zero (where too close is 1ms = 0.06 frame = 31.5 of 525 progressive scanlines? IDK), and the video source doesn't change scanlines per frame/field. If it does, eg. when games switch between 240p and 480i and change frame duration by ~1/525 (±0.5 of 262.5 interlaced scanlines), I'm not sure if we can respond fast enough if we have a very low initial latency.

I'm less confident on how to talk to the hardware:

the TiVo can get things wrong occasionally, via I2C and the debug port. Using those values will throw everything off.

I don't know what bad measurements look like. IDK whether to reject or retry on bad measurements, or how to detect bad measurements considering the "wraparound" issue and that consoles can legitimately change sync (unsure when, upon switching between consoles or power-cycling, or switching 15khz/31khz within a console? does video stop being output upon switching games or console reset?).

via I2C and the debug port

By "debug port" do you mean "debug sync pulses" and attachInterrupt(DEBUG_IN_PIN, risingEdgeISR_prepare, RISING)? Or did you mean something else?

The measurements are all taken on the lightly processed source VSync interval, against the fully processed output (out of the TiVo) VSync interval, all using the configurable debug IO of the TiVo

Is I2C synchronization currently not being used?

ESP8266 GPIO

Is this related to vsyncOutputSample and risingEdgeISR_prepare and risingEdgeISR_measure, which work off interrupts? But later you say you disable interrupts:

ESP itself is not very good for real time processing, but some tricks make it better, most importantly the disabling of interrupts

And the ISR functions are called by interrupts, and disable interrupts within the functions but not after they return?

Initializing latency to a specific value

Dunno how, maybe we'd have to strategically stop and start video output synchronized with input?

I really do think the current design of risingEdgeISR_prepare and vsyncOutputSample/vsyncInputSample (are they identical?) is not well suited for the general case of synchronizing tasks and measuring delays relative to sync, rather than solely measuring the period between sync events. I'd have to write new functions for this. (You technically could use these functions to wait for input vsync, but it would sleep through two vsyncs to measure their interval even though you only need to wait for a single one.)


In any case, I don't know how much motivation I have left to see this change through to completion.

nyanpasu64 avatar Jan 25 '23 22:01 nyanpasu64

It is a big topic that I, frankly, do not have the time right now to deep dive enough into. What you've thought about so far sounds spot on, and you "just need to do it" :D

Couple things:

  • The disableInterrupts stuff doesn't disable the GPIO one. Beats me why it's called that. The framework is Arduino, I guess :p
  • I think you can get away without determining in-to-out latency, assuming it is fixed as soon as in and out frequencies are matched. It appears to be so, because otherwise, the current method wouldn't work as well as it does
  • vsyncOutputSample / vsyncInputSample are looking the same throughout the code path, iirc. That is just an OOP thing. They do actually measure the input and output VSync period as described
  • About the bad measurements: You just need to be aware that freak measurements happen.. If you encounter one, you just need code that evaluates what the problem is. Is it an actual source format change, a sync glitch, or maybe a bug in the code even. It is tricky, because you happen to need these calculations exactly after a format change, which in itself is an unstable period. It is about weighing fast responsiveness against correctness.
  • "rsr %0,ccount" is an assembler command that indeed reads the current ESP CPU cycle counter

ramapcsx2 avatar Jan 26 '23 03:01 ramapcsx2

I think you can get away without determining in-to-out latency, assuming it is fixed as soon as in and out frequencies are matched. It appears to be so, because otherwise, the current method wouldn't work as well as it does

  • Surprisingly I discovered that FrameSyncManager::vsyncPeriodAndPhase already measures total latency! Unfortunately it's only used by FrameSyncManager::run, which is only active with the clock generator uninstalled or inactive, at which point it adds blank scanlines when falling behind, rather than adjusting the output pixel clock to compensate. I think to reduce the effort required for a working prototype of closed-loop output sync, I'll build off this function, instead of trying to invent a background sync watcher that doesn't block the main loop.
    • If I wanted to eventually implement non-blocking latency measurements, is it safe to keep debug pin interrupts always on, update frequency/latency state in the background, then write the result to volatile globals and set an updated flag? Then the main loop would check and clear the updated flag, then read these volatile globals if updated.
      • Reading volatile variables from the main loop without synchronization can suffer from tearing (if the interrupt updates all values midway through the main loop reading them), but since frequency and latency should change slowly, tearing is not a big deal as long as each global is only read once per main loop iteration (ideally in an encapsulated function) to avoid different parts of the algorithm operating on different state.
  • I don't think the current method with clock generator "works" well. The only reason it takes hours before users notice tearing is because your initial frequency estimate is quite close (less than 1 part per million error) and the latency drifts very slowly. But I predict you'll never avoid latency drift (if the clock generator or video source drifts) and eventual tearing, without either passthrough mode (output timings locked to input), or closed-loop control to adjust the clock generator to keep the output a fixed distance behind input.

About the bad measurements: You just need to be aware that freak measurements happen.. If you encounter one, you just need code that evaluates what the problem is. Is it an actual source format change, a sync glitch, or maybe a bug in the code even. It is tricky, because you happen to need these calculations exactly after a format change, which in itself is an unstable period. It is about weighing fast responsiveness against correctness.

What does a bad measurement look like?

nyanpasu64 avatar Jan 26 '23 22:01 nyanpasu64

Bad measurements would simply be a period measurement that's not reflecting the real values, due to some problem or another. It could be way too long, or a double throw, etc.

But that's what I meant with working well: It manages to sync and work .. for a long while :p There is no readjust at run time, iirc, which is what you're proposing to add, right? :) Of course all clocking systems drift, and that will lead to eventually running out of sync.

"FrameSyncManager::vsyncPeriodAndPhase" ... I think that was "method 2" of trying to sync, which works by adjusting the output timing parameters, adding or removing a scanline to try and keep in sync. With a clock generator, you have a better way to control the output, than the rather coarse full scanline.

ramapcsx2 avatar Jan 27 '23 04:01 ramapcsx2

Just posted #405 with an 80% fix for this bug.

  • Should I extract out solely the bugfix, or submit my non-sync-related refactors in separate PRs first?
  • Right now, when exiting from HBC to the Wii Menu's Health and Safety Screen, this code sometimes guesses the input FPS wrong and sets the output FPS wrong. Does this need to be fixed before merging?

nyanpasu64 avatar Jan 28 '23 23:01 nyanpasu64

Oh gee.. I don't think I can properly reply to that massive PR >< General guide line:

  • If you can make your new code stable for you, that's a good start.
  • If you can run a few more sources (SNES and PSX for example cover a broad range), the better.
  • If fixes and improvements can be broken up into smaller PRs, then that's good, but sometimes everything is interrelated
  • Don't be afraid of breaking things. If it works for you across a few different scenarios, chances are that it's good :)

ramapcsx2 avatar Jan 29 '23 11:01 ramapcsx2

Really great to see someone working on this issue. This one really makes me hate gbs-control sometimes. And I already was on my way ordering a different scaler just to finally have something to "connect and forget about it".

M-Reimer avatar Jan 31 '23 14:01 M-Reimer

I can test on several PS2 models for anyone that needs help. including PSX-DESR and ProtoKernel PS2.

israpps avatar Jan 31 '23 15:01 israpps

It's good to have more testers. To run the code locally:

  • Check out https://github.com/nyanpasu64/gbs-control/tree/fixed-latency (gh pr checkout 405, you may need to add my repo as a remote to do it manually).
  • If you want to see input/output frame rate and latency measurements in your debug console, go to framesync.h and uncomment #define FRAMESYNC_DEBUG.
  • Upload to your GBS-C (either normally, or using Arduino IDE v1 and OTA wireless flashing).
  • In gbscontrol.local, Settings tab, uncheck "Disable External Clock Generator" and check "FrameTime Lock".

At this point, the developer console should say "Active FrameTime Lock enabled, adjusting external clock gen frequency"; if it says "Active FrameTime Lock not necessary with external clock gen installed", you didn't flash my branch properly. If it says "disable if display unstable or stays blank!", your clock generator isn't installed or enabled properly.


At this point, start up your game console. To stress my latency compensation algorithm, try rebooting or resetting the console, or changing the console's video mode (240p, 480i, 480p). Alternatively, to directly increase latency, visit the web UI (gbscontrol.local) and repeatedly click a resolution button. Then check if your monitor loses sync, because my code changes the video output frequency too far (up to 0.3% up and 0.1% down from the input frequency, until the latency approaches the target of 0.25 frames). If this breaks sync on monitors, I may have to clamp my maximum frequency offset to 0.1% from the input frequency (to avoid large offsets over time), then within 0.1% from the previous output frequency (to protect against output frequency swings if the input frequency is measured incorrectly).

You can also try playing games for multiple hours, and check for tear lines (they should never appear, but it's good to get confirmation).

Report the errors you see in the debug console (leaving FRAMESYNC_DEBUG commented/disabled makes it easier to see the errors only):

  • When the console reboots or switches video modes, any errors you see are normal, but "FPS excursion detected!" means my code has suboptimal error detection (though this should not damage the display since the output frame rate only changes by 0.1% each time). This message should never appear multiple times in a row.
  • During normal operation, latency synchronization may fail occasionally, with messages runFrequency(): vsyncPeriodAndPhase failed, retrying... and getPulseTicks failed, retrying.... This is a known issue that shouldn't cause problems. You should not see messages like fpsInput wrong: ..., retrying..., fpsInput2 wrong: ..., retrying..., measured inconsistent FPS ... and ..., retrying..., FrameSyncManager::runFrequency() failed!, or even worse FPS excursion detected!.

I forgot to test FrameTime Lock with the external clock generator not installed, oops.

EDIT: Testing it now on my Gateway VX720, both lock methods make the bottom of the screen shift up and down; I'm not sure about the top. As I understand it, these odd display issues are normal in this operating mode.

nyanpasu64 avatar Feb 01 '23 10:02 nyanpasu64

The non-clockgen modes tend to work really well on some displays, but on others, you get a line coming in and out, or worse, the whole picture shifts :p

It's good to read that explanation of what should happen, gives me good confidence that the new code will work :)

ramapcsx2 avatar Feb 11 '23 23:02 ramapcsx2