RetroArch
RetroArch copied to clipboard
Rolling Scanline Simulation
Description
Added rolling scan line simulation based on the shader subframe feature. This is implemented with a scrolling scissor rect rather than in the shader itself as this is more efficient although may not work for every shader pass - we may need an option to exclude certain passes. The implementation simply divides the screen up by the number of sub frames and then moves the scissor rect down over the screen over the number of sub frames. The higher the refresh rate the more accurate the scanline simulation. Implementing a rolling scanline on the GPU is a really poor implementation choice and should instead be done on the display itself as an entire image needs to passed over the cable for every subframe BUT we have no control over displays so this is the next best thing.
Hello,
First, as the author of the BFI code (past the initial simple 120hz implementation), and the sub-frames, I want to say I obviously, more than most anyone, appreciate BFI and what it can do in general and am always happy to see further improvements.
But, I was sort of hoping the sub-frames feature would allow things to go the other way in the RA configuration side, and to be able to remove some of the hacked in functionality in the driver code, instead of further expanding on it. Perhaps even eventually allowing for the hard-coded BFI implementation to be removed at some point, and just allow it all to be handled more elegantly (if less efficiently) via shaders.
Also, regarding rolling-scan in particular, for testing the sub-frame shaders I did build a rolling scan BFI shader that I posted in the programming-shaders a little while back around when it was merged, that functions in nearly this same way (though, again, less cycle efficient of course). How much that cycle efficiency matters should probably be measured.
Also, at least for my shader implementation, it was just a bit too crude for me to ever consider using it over the full-frame 'standard' BFI. I was mainly using it just to test that subframes were working in general. The lines where the subframe dividing were just very apparent, it was almost like vsync off tearing, even though there was no vertical shift. It's possible this could be improved with some overlap (with brightness adjustment etc), but that's certainly, in my opinion, a job for shaders to try to fine tune and not on the driver side.
Anyway to the RA team in general, just giving my thoughts, I'm not going to be angry if this is merged or anything.
Oh one more specific request I have if it is merged, for my use for the shaders, current_subframe starts at 1, not 0, for when subframes aren't in use. It looked like the video_info default you were using was starting at 0, and while I didn't see any direct conflicts with the existing sub-frame code.. having it start from a different count could get confusing.
@Ophidon oh wow, I totally missed your rolling scan shader. I'll have to check the scrollback.
I've been working on a rudimentary one myself using sin() to fade the edges and avoid the tearline effect. It works okay at 120 Hz (which is as high as my monitor goes) but it leaves a big space in the middle that's always non-zero (and a steady 0.5 in the middle), which means half the brightness but no benefit to motionblur. Doing the same thing at 180 Hz should be much nicer since it'll end up with a black bar in the middle on every third subframe.
But I digress. As for the current PR, I think having a driver-based option is going to open the effect up to more people, as a shockingly large minority of users apparently eschews shaders altogether for a variety of reasons.
I do agree that the subframe numbers should be in concert between the driver method and the shaders, though I planned to discuss whether starting at 0 or 1 is the best idea. I also expected that it would start at 1.
I also wondered if we might be best-served by combining the 2 separate subframe uniforms into a single vec2 (or higher) using the swizzle to differentiate, which would also allow for future expansion if we come up with another useful value (e.g., a pre-calculated current subframe divided by total subframes, which I find myself calculating a lot when messing with them). Sorry, I'm veering off-topic again.
Hi, thanks very much for this implementation.
I understand there are some things @Ophidon takes issue with. Are there any things in specific Ophidon can think of that can be done that could convince you that this PR can be merged?
Hi @Ophidon, hope youre well and thanks for all the various versions of bfi it does help with motion clarity immensely.
So Id like to take a step back here as what technically were you hoping to achieve with shader subframes?
The issue of interference (Id avoid the word tearing as that, to me at least, is the display of part of one frame and the rest of another) at the sub-frame edges it looks to me to be a display side issue. At least on my display it looks like I get noise at the joint and is probably dependent on the panel being used. I have to investigate this further though but Im not sure you can resolve that particular issue GPU side as it might work for one panel but make matters worse for another.
As for implementation Im not sure the argument really holds that shader subframes cleans up the code as youre having to fill out the constant buffers which is just another piece of gpu state that is passed over and really is no different to scissor rect state being set.
I do disagree that handling this in the shaders is a more elegant solution though. It just adds more implementation variations for something that really only can be implemented in a very limited number of ways and brute forcing it via shaders just seems wrong to me.
EDIT: [STRIKE THROUGH: This statement on further thought is probably not true - I leave here for full transparency and some points still stand] Ultimately any one pixel has to be displayed fully on for one subframe: any deviation from that either ends up with overly bright or dark areas areas and/or adds more motion blur over standard bfi. That essentially leaves you with what Ive done or you do an interlacing type scheme where you divide the screen into more areas vertically and display a number of them per subframe but that just gets you further away from how a crt works and just makes the interference problem worse. [END STRIKE THROUGH].
I do see that scissor rects adds added complication to the driver but I clearly marked it using #defines which a lot of additions dont. I think setting GPU state to do something is just part of what a retroarch driver does and in this case simulating a crt is just part of that.
What I will say after all that is that at least with my shader, the sony megatron, the split falls on a black part of the scanline at 120hz and so is hidden and is not a problem.
However as you say Im not sure any of this is a benefit over standard bfi - its just technically more accurate to how a crt works. I see this feature as more a stepping stone and really should be done by the display but we're not in control of that and so we have to do all these bfi 'hacks'.
Ok so after further thought I can see another way that does involve shaders but also to be efficient should use scissor rects:
Before I write this @Ophidon I haven't seen your rolling scanline shader implementation as I can't find the shader - is it a slang shader? So apologies if this is just repeating what your implementation does.
So above I believe I am probably wrong on saying any one pixel needs to be fully on for one sub frame and fully off in the rest of the sub frames. I can certainly see that as long as the luminance over the frame adds up to 100% you can have any distribution of luminance per frame over a set of sub frames.
I'm going to implement the following scheme in the Sony Megatron: again I'm going to divide the screen up vertically by the number of sub frames but instead of doing what is in this pull request I am instead going to take two consecutive subframes and for their particular subframes box I am going to blend linearly from full opacity (at the top to full transparency at the bottom i.e black. Then for the previous frames subframe box I am going to blend linear from full transparency down to full opacity.
Thus for every pixel a full 100% luminance is written over the screen with every pixel being lit twice by the two consecutive subframes.
This scheme has a major issue though in that we need to retain the previous back buffer to finish off the luminance for the bottom of the previous frame - so more C code for all our drivers (I'm not too concerned about this as long as things are clearly delineated and labelled in the code BUT I'm not having to maintain this project so I can be like that!).
There are another few downsides compared to this pull request implementation:
a) Ideally we'd still use scissor rects as at least we can keep down the cost to executing the shader twice at high refresh rates which is a high concern for a lot of users.
b) Shader authors would probably need to add support for their particular shader - its not a universal solution like this pull request. This could be implemented as a post pass that can be added to any shader preset though so maybe that's not to bad but presets would need to be added/updated and shader parameters added so there's little getting away from this not being truly universal again unlike this pull request.
The benefit is that it potentially gets rid of the interference noise and theoretically gets us closer to a rolling scanline that a CRT has.
Actually we can make this proposal universal! By universal I mean that it works with all shaders without having to modify them and allows users to simply switch it on/off in the main menu. So we can make it universal by adding a post pass automatically like I did for my HDR implementation as in a built in shader. Regardless of it being universal we would (as I said above) have to add support for keeping the previous frames back buffer about for a frame so there's driver work and added complication regardless unless someone has a better idea?
You could have this less intrusive pull request as the first implementation and then add this more advance version later on.
Good Morning,
Very thorough analysis. Enough to put me to shame really, hah. I do have some random thoughts at this point though.
First when I was thinking about it being 'elegant' trying to just do everything from the shader side, it is mere conceptual stuff that it's a nice flow if: driver takes input image, hands off to shaders for whatever transforms are desired, gets the transformed image back and finishes output. Adding various image transforms, but not others, on the driver side just makes it 'feel' more like spaghetti code to me. This wasn't possible with BFI (or anything else truly temporal) before the sub-frame shaders implementation, so it made sense previously to me there. But, I fully admit the shader side can be significantly less efficient and that I am more immune to the downsides of this with a modern high-end desktop gpu than most.
Next, regarding the utility of rolling scan BFI vs the existing implementation. I believe there are actually 2 'real' benefits beyond just the theoretical 'correctness' of doing it the way a CRT does. One possibly is (though this is subjective experience and not truly tested) a significant reduction in eye strain even at the same Hz. The average amount of photons being sent towards your eyes remains relatively level with rolling scan bfi, instead of full image flashes.
From my own personal reference, I have a CRT, and IPS screen I use at 180hz, and a new oled at 360hz. In order of the amount of time it takes me to feel any level of eyestrain, the CRT is the longest, the IPS the second, and the oled the shortest, with 'normal' full-frame bfi in use on both of the latter screens. My thoughts on the difference between the IPS and the OLED is that software bfi automatically does a natural level of rolling scan just because that's still how the pixel scanout on the screen is occurring, it's just going at 180/360hz instead of the preferred 60hz that would always keep a section of the screen lit. Also for the IPS, there is markedly longer rise/fall time on the pixels perhaps getting closer to some part of the screen being lit continually, and further the backlight is remaining constant regardless (this is probably the bigger factor). So oled is the screen type that could benefit the very most from a good rolling scan implementation, in my current opinion.
The second possible 'real' benefit for rolling scan over full-frame bfi is also mainly for OLED with its lower peak brightness. Rolling scan only lighting up a smaller portion of the screen at a time as Hz gets higher and higher, should allow it to take advantage of peak brightness window limits to compensate for the brightness loss that would otherwise reach unusable levels even in a dark room. I don't know, however, how well the screen algorithms work for such quickly changing lit and completely dark areas, if they lag even a single sub-frame behind, well....
On the last subject of your current implementation change ideas.. needing access to previous frames is actually another point in favor of the shader-side implementation isn't it? Via these that already exist:
OriginalHistory#: This accesses the input # frames back in time. There is no limit on #, except larger numbers will consume more VRAM. OriginalHistory0 is an alias for Original, OriginalHistory1 is the previous frame and so on.
PassFeedback#: This accesses PassOutput# from the previous frame. Any pass can read the feedback of any feedback, since it is causal. PassFeedback# will typically be aliased to a more readable value.
In general yet another point in favor of the shader side in my opinion is, especially if we want to try to hide the rolling scan interference lines in a black area or apply hdr specifically for bfi brightness loss compensation, it is good for the shaders to be able to know if these are enabled or not. Which they easily can if its just part of the preset, but I don't think they can otherwise, unless we send yet more uniforms to flag it.? One more neat thing of shaders having control of BFI is it can only be applied to the 'real' part of the screen when bezels or other borders are in use.
Annnnyway, I'm fine with whereever we go from here. Motion clarity getting any coding attention at all, after all the dismal 60hz sample and hold lcd years that it drove me crazy so many people were blind to, is welcome to me. My one real change request remains just making current_subframe start from 1 instead of 0, if you still use that, to match how its being sent to the shaders. :)
Great points on the additional benefits of a rolling scan - I never thought about the fact it could help with peak brightness because the window size is much smaller. This might be a real win and is already a benefit that can be had with this pull request actually think about it - it might be why it feels brighter on my screen with the Megatron - dont know - I need a colorimeter to tell.
So with regards to the shader-side implementation you're talking about, the driver has changes in to support the original history, pass feed back etc so its kind of not true its 'shader only' as such, much like you passing in values via the constant buffer you're still having to change the driver and add support. All we're doing here is adding more scissor rect functionality but maybe we can clean this up a little by setting a single variable that all sites can test for subframes being on and all the other various edge cases that you've caught.
I've still got to prove out my theory for the 'next gen' proposal as I need to convince myself that this will work without areas being dim but its even better that I might not have to write too much more to get this working because of the original history (Im not sure pass feedback would be relevant for this situation - I think! not 100% sure).
As for the current_subframe sure I can change it to start from 1 but I'm going to have to change the math elsewhere to turn it into a proper index by subtracting 1 off of it - its not too bad admittedly just a bit unintuitive to most coders. What was the reason for starting it at 1 rather than the much more common 0? I do agree both values should match though.
Admittedly starting at 1 over 0 isn't something I put a -ton- of thought into, but I did some.
It started with knowing that, as hunterk mentioned above, the most common thing that would be done with these values on the shader side would be dividing CurrentSubFrame by TotalSubFrames for their ratio.
And knowing that, I didn't want TotalSubFrames to ever be 0. I don't know how shaders handle div by 0, so I just did it to be safe. I'm not actually a slang shader expert, I just knew my existing bfi code could be retrofitted to allow shaders to do whatever they wanted with the extra frames.
Further, I wanted the default ratio to come to 1, as in 'I need to handle this whole real frame interval right now, not a subsection of it' when either the subframe setting is disabled OR the menu is up/core is in ff, or paused, etc. Thus I made CurrentSubFrame start at 1 as well, so that would be true.
heh, those are actually really good justifications for starting at 1. I hadn't considered them.
It started with knowing that, as hunterk mentioned above, the most common thing that would be done with these values on the shader side would be dividing CurrentSubFrame by TotalSubFrames for their ratio.
Yes so when writing a shader I'd normally want to start at the top of the screen on the first sub frame so if 2 subframes then 0 / 2 = 0 and then on the second frame I'd like to start half way down 1 / 2 = 0.5 you can then easily add 1 to the current frame on both to get the bottom part of the area.
If you instead start at 1 I'm now essentially getting the bottom of the area. This is fine as we can just subtract 1 from it to get the top its just something most shader writers would find unintuitive- starting from the bottom of the an area/screen and working upwards - its a bit odd is all and will catch people out who aren't expecting it.
And knowing that, I didn't want TotalSubFrames to ever be 0. I don't know how shaders handle div by 0, so I just did it to be safe. I'm not actually a slang shader expert, I just knew my existing bfi code could be retrofitted to allow shaders to do whatever they wanted with the extra frames.
So why would TotalSubFrames ever be 0? It should never be set to 0 as that would be no frames at all right?
Further, I wanted the default ratio to come to 1, as in 'I need to handle this whole real frame interval right now, not a subsection of it' when either the subframe setting is disabled OR the menu is up/core is in ff, or paused, etc. Thus I made CurrentSubFrame start at 1 as well, so that would be true.
But when the sub frame setting is disabled surely the total number of frames is 1 and the current frame is 0 and so we're ok in our shader as we're now just working over the total area of the screen from top to bottom with whatever code we've written? Same goes for the menu etc. I'm not sure I'm following the reason here I'm afraid maybe I need to think about it.
So I just tried a test version of my second proposal by having a whole screen gradient from full brightness down to black and then reversing it in the next subframe and the result is not good. Basically in the center of the screen (when using 2 subframes) where we're in the middle of the gradients for both sub frames (in my test version) you get the darkness of BFI but then suffer the motion blur of a standard screen because there are no totally black pixels frames.
Possibly this technique might be good for 180hz screens (ideally 240hz) and above as there would be a totally black pixel for at least one of the sub frames i.e in the middle of the gradient (the middle of a subframe area) it would be 50%, 50%, 0% over the three frames. This has a darkness issue though as essentially out of three frames the pixel is only on for one of them on average. Same goes for 240hz the pixel is only on for one frame out of 4 - essentially you keep needing the screen brightness to be searingly brighter as you go up 60hz steps.
As such as it stands this pull request looks about as good as it gets for a rolling scan at 120hz - it offers possibly slightly brighter screen over standard BFI because only half the screen is lit in any one frame but has the down side of the interference at the intersection which is luckily covered by my shader but we're not so lucky in other situations.
TotalSubFrames could have been 0 if you don't conceptualize the first and therefore 'real' frame to be a member of the following 'sub-frames'. But as doing it the other way, counting that original frame as a sub-frame, led to not needing to worry about div by 0, the choice was easy of course.
The same is true, about it just being a matter of conceptualization, for why currentsubframe started at 1. To me, having the ratio of current/total be the 'up to' point made the most sense. So 1/1 = 1 = the whole frame, for the default scenario, made the most sense. And as to the first in series starting at 0 or 1, meh, neither bothers me, so I went with what got me a ratio of 1 instead of 0 as the default for that reason. If it was an actual array index, or something similar, that would have been different and I definitely would have started at 0.
If you want me to test things for the rolling scan, I can at 120/180/240/360hz. I dont think I can at 300, as the 360hz 1440p oled uses DSC which apparently disallows creation of custom resolutions, a fact that would have been useful to know before I purchased it. -_-
my second proposal by having a whole screen gradient from full brightness down to black and then reversing it in the next subframe
heh, yes, this is exactly what I was describing with my stab at it. The darkness of BFI with the motionblur of no BFI :sweat_smile: I used sin() to get the gradient so it would hopefully scale gracefully to other refresh rates, but at 120 it's identical to multiplying by texcoord.y (or 1.0 - texcoord.y).
To me, having the ratio of current/total be the 'up to' point made the most sense. So 1/1 = 1 = the whole frame, for the default scenario, made the most sense.
Regarding my quote above here, keep in mind that directly relating this current/total subframe ratio to a vertical slice is true for rolling scan bfi, but isn't for a lot of other possible uses of sub-frames, so it has to be more generic in meaning. Which was part of the consideration of making the default 'off' values give a ratio of 1 with the meaning of 'handle it all' not just specifically, as in the case of rolling scan bfi, 'to the top of the screen'. I'm hoping there's plenty of sub-frame uses no-one has thought of yet.
One thing I'm currently interested in seeing is if, at least for 2d, the sprite and background layer(s) from some cores could be sent separately to the shaders to be able to make 'smarter' motion interpolation/frame generation with considerably less artifacting.
I think overall that's quite enough talk about a previous PR instead of this one and a fairly inconsequential 0 vs 1 now though, lol.
As for this feature, I am quite willing to test how it looks at higher hz and on various screen types, but if issues with the line(s) remain, I think it's just regrettably an undercooked feature at the moment. And keep in mind I say this as a fan of the concept because of the benefits of it working (well) I mentioned before.
I definitely don't think we can rely on being 'lucky' that any given applied shader covers it, especially considering that done this way the shaders currently have no way to know the feature is on, and thus be able to try to consciously adjust for it. And as for the higher hz pixel cycles like 50% 50% 0%, I believe hunterk is correct that you're getting the full brightness reduction downside of BFI without the corresponding clarity, a terrible tradeoff. :/
At 180hz, for instance, you get ~66% brightness reduction and ~66% blur reduction with a 100-0-0 cycle, whereas the 50-50-0 cycle still gets you ~66% brightness reduction, but only ~33% blur reduction. :/ And keep in mind I have updated the 'standard' full-screen bfi fairly recently to have brightness/clarity tradeoff choices now at higher hz as well, so it has an advantage there still too. 180hz can do 100-0-0 or 100-100-0. 240hz can do 100-0-0-0, 100-100-0-0, 100-100-100-0, and so on.
Ok so Ill change it to start at 1 but youre going to get complaints from others. Its generic and applicable to every situation that 'total' is a count and 'current' is an index - it is an array of sub frames contained inside a frame conceptually.
As for my test results yes thats what I said above its not good. Im not sure there are any other options for shaders to take on this as the interference is a display side problem and cant be fixed by a GPU that has no knowledge of how the display works. Its probably a temporal algorithm display side causing it as well. We really are limited in options as my initial statement that I struck through above does in fact appear to be true i.e you have to have pixels fully on or fully off.
One last thing to try from my point of view is to see what happens in the worst case scenario for an interlaced scheme i.e instead of one rect per sub frame as we have here we use many. The worst case is then just alternating lines on and off for each sub frame like interlacing of old but within a frame (and I suppose one interlaced frame is black). Id like to see what happens to the interference in this case to fully see what we're up against.
One last point is that it feels as if you see this pull request as limiting solutions provided by shader writers. Just to clarify that it doesnt - shader authors can still try to fix the interference issue themselves with this on or off - all we're doing here is limiting the area the shader draws to but its still executed and the authors can still tell where the scissor rect is and where they are drawing to - its just an additional help to them.
Just to also add I will add support for 180hz doing 100, 100, 0 and 240hz doing 100, 100, 100, 0 etc etc Ill just add an option when 3 subframes and above is used and this rolling simulation is turned on. The other thing to add eventually would be to drive the scissor rect from the .slangp - at first whether its enabled/disabled for a given pass and then to determine where it is and then to determine where it is over subframes. Quite a bit of work though for that...
my second proposal by having a whole screen gradient from full brightness down to black and then reversing it in the next subframe
heh, yes, this is exactly what I was describing with my stab at it. The darkness of BFI with the motionblur of no BFI 😅 I used sin() to get the gradient so it would hopefully scale gracefully to other refresh rates, but at 120 it's identical to multiplying by texcoord.y (or 1.0 - texcoord.y).
Yeah I think we're quite limited on options here - it does feel like bfi is basically the only way from a gpu perspective (or relying on shaders to hide the interference as the megatron does). If someone could give us a programmable display then we'd be cooking!
I know it isn't limiting anything, I'd have a MUCH bigger issue with merging it if it did. ;)
Even in my very first post I said I was ok with it being merged, I just wasn't fully convinced it was necessary to do driver side. But whatever, no harm no foul since, as you say, it wasn't removing any ability to do it the other way.
If I've slipped a bit farther from feeling it's mergeable, it's not because I feel you've done any sort of poor job coding it, or even any disagreement over driver/shader side, it's that I'm (regrettably) starting to agree there is indeed perhaps some display side problem making this nearly an unsolveable problem on our end.
Full-frame BFI has a well known display side issue with voltage retention on a lot of lcd's at 120hz. But thankfully there is also a rapidly growing segment of displays that don't have the problem at 120hz (oleds) and any higher Hz mulitple than 120hz can also be made to work without issue on lcds. For this rolling scan issue... if there is a display that is immune to it, I haven't found it yet, and I've also now tried on my VA television at 120hz and pulled a TN panel out of the closet to test that too, so that's literally one of every current non-crt display types in use that I've tried it on.
So, what I wouldn't want, more than any of the rest, is for the setting to look so poor for the users (either via the interference lines, or a brightness loss that doesn't get commensurate clarity gain in trade), that they just immediately say 'woah, bfi is awful, nobody should ever use that!' I wouldn't be a fan of a nearly pure 'trap' setting being available so to speak. And don't forget even with megatron shader you're testing with that might be luckily hiding it when you only have one line at 120hz, but how about at 360hz like I can run it... that's a lot more joints to keep track of, in different locations.
Try this for your implementation, maybe without any shader on that is hiding the joint. Keep your screen still and your eyes as still as you can.. for me in that scenario none of the 'interference' is really apparent. Move your eyes up and down rapidly though, and it becomes super apparent. Which maybe means it's not a display thing, so much as an optical stroboscopic effect issue?!
CRT doesn't have apparent joints of course with rolling scan, nor have I heard of it with the lg oled tvs that have built in 60hz bfi (which also is rolling scan). So, if that's really what's going on, I hope it just doesn't require literally being down to a line or 2 being 'rolled' at a time to avoid the effect, which would require utterly ridiculous Hz to emulate.
I was just reading a thread on blurbusters' forum about checkerboard/interlacing/noise BFI and he/they said all of those strategies look worse. I was playing around with monitor-pixel-scale checkboarding and it indeed looked weird in a different way lol. I couldn't tell whether the motion was any better, and to my eyes, it looked choppier (possibly an optical illusion).
Similarly, he suggested there's no free lunch when it comes to brightness vs motion blur reduction. 50% brightness reduction (i.e., half on, half off) = 50% blur improvement. 25% brightness reduction (i.e., on, on, on, off) = 25% blur improvement.
He mentioned rolling scan as a partial outlier, but that you need blended edges to avoid artifacts (as we've seen/experienced). However, it also seems clear that any area that is not completely black at least part of the time is not really going to get any motion blur improvement.
Yeah a lot of my existing BFI implementation comes directly from his theory as it is now. I talked directly with him a good bit when doing the initial changes for allowing any multiple of 60hz instead of just 120hz, back in 2020 or whenever it was.
And yeah, there is absolutely a direct relationship between full black period and motion clarity, not just reduced brightness. Aka 100-0-0 being heavily superior to 50-50-0 for clarity but equal for avg brightness as I mentioned a bit ago which is a horrendous deal. If avg brightness itself was a factor for clarity, you could just turn down regular sample and hold brightness and we wouldn't need to bother with this annoying flicker at all. :)
One thing I question though is how he means to avoid artifacts for rolling scan bfi with blending edges. But that would depend on his definition of 'blending' I suppose. Best case scenario, unless we're missing something, I just see the 'interference' lines getting replaced with small strips of the screen that have lower motion clarity than the surrounding areas via some blending or subframe overlap method. And that lower motion clarity strip would be a reasonably apparent 'artifact' in itself. Slightly more tolerable than the interference lines perhaps... but I dont see who would ever pick it over the existing implementation that has no visual artifacts at all.
Ok so Ive got a lot to say here so Ill chunk it up a bit: firstly bfi and shader subframes.
Lets not merge these two techniques/features in the future.
From a user experience perspective they are two separate features in the menu currently.
From a technical perspective they should also be separate: the original implementation of BFI is perfect: the first subframe we execute the shader and the subsequent sub frames we simply clear the screen (you could argue we need an option to clear to non zero black but I digress).
Clearing the screen is a highly optimised operation and will be much much faster than anything a shader does as it has hardware support (most gpus will not write to the surface but instead set meta data - signalling its been cleared to a value held in a register).
This is really important for a number of reasons not least that its more power efficient. This means the battery on your phone doesnt run out as fast and less heat needs to be dissipated. Less heat dissipated means screens on a mobile device can run brighter and cpus and gpus can run faster and more efficiently. The latter being critical for low end devices such as the raspberry pi. The pi5 being a key milestone for retroarch shaders, hdr and bfi but this applies to laptops, phones, consoles etc that retroarch is used on.
Efficiency is king and for those same reasons really the new lit bfi feature should copy the first frame rather than execute the entire chain of passes again. But is of less concern atm given fewer low end devices support refresh rates above 120hz - maybe the pi5 is the exception but future proofing is good.
Personally I use retroarch on my mobile more than any other device so this kind of stuff is important to me - Im looking into adding 120hz support to retroarch on android that my phone does provide and Id like to use BFI with it.
So shader subframes is a different feature to me as its a different implementation as has been done so we're good just lets kill the idea of potentially merging it and bfi.
Ill cover the meat of above later on when I get a moment.
Is that Android using an oled screen? Otherwise you are fairly likely (but not guaranteed according to some reports, it's all down to how the screens algorithm works) to run into the 120Hz voltage retention issue.
If you are talking about a 120hz lcd, there is a known solution to combat that voltage retention as well. I haven't officially implemented it, because it DOES cause visual artifacting of it's own at a level I find too annoying for it to be a 'true' solution, but that can be subjective.
What you have to do, is at some user defined rate that will be before the image retention is noticeable (which for my screen was around every 20s) hold either an 'on' or 'off' frame for a double beat (which will cause a quite noticeable intentional flicker) . So the injected stutter pattern would go like on-off-on-off-off-on-off-on-off.
I played around with it, but I also just felt it was close to a 'trap' setting like rolling-scan would be without better solutions to its issues. Anyone who truly cares about BFI, imho would be better served getting a screen where that issue just doesn't exist, than dealing with that hacky half-solution. Which should be possible in the mobile space too, thanks to oled.
If you want to implement it yourself though, feel free. Only as an -option- at 120hz of course, not forced, since not all screens are affected. Technically 240hz doing on-off-off-off is (somewhat less) susceptible too, but that can be handled just by using 240hz at on-on-off-off instead (which it will default to now with the dark frames settings). All odd multiples like 180hz, 300hz are completely immune at any bfi setting.
I just see the 'interference' lines getting replaced with small strips of the screen that have lower motion clarity than the surrounding areas via some blending or subframe overlap method.
Yes, this is my hypothesis, as well. Unless all pixels are full-black for at least some amount of time, the ones that just get turned down will have the same reduced brightness but no blur benefit, and if you blend the edges beyond that full-black area, those blended areas will be that much darker over time. So yeah, I'm not really sure how it's supposed to be superior, unless the reduced-brightness-with-no-blur-benefit strips are just considered the cost of doing business.
Ok so I've been doing some more experimenting and I think I have some kind of a solution but I've still got to fully crack the problem - I can get it to work in specific scenarios but need to do more tests for a general solution. I'm a bit hampered by my display not liking 120hz at all. Anyway bear with me.
I just see the 'interference' lines getting replaced with small strips of the screen that have lower motion clarity than the surrounding areas via some blending or subframe overlap method.
Yes, this is my hypothesis, as well. Unless all pixels are full-black for at least some amount of time, the ones that just get turned down will have the same reduced brightness but no blur benefit, and if you blend the edges beyond that full-black area, those blended areas will be that much darker over time. So yeah, I'm not really sure how it's supposed to be superior, unless the reduced-brightness-with-no-blur-benefit strips are just considered the cost of doing business.
Sorry been away and havent had time to experiment further over the weekend but my experiment was essentially how much of a strip you need to hide the 'interference' and in my limited case of using the megatron I only needed a single pixel either side of the dividing line so not much.
We can easily combine this with aligning the spilt to the dark lines of scanlines - we always know its upscaling and we always know where the points between scanlines are. We dont know however if the shader will have dip in luminance between scanlines. But a drop in motion clarity for two pixels might not be that bad.
One big issue im finding is that shader subframes for some reason looks to be causing a split in that the top half of the screen lags behind the bottom half. This is really noticeable on my favourite motion blur scenario: 1st level of Dynamite Heady on the Megadrive. Can someone else see if they can reproduce this effect? Its really strange and might be a bug my end.
Problem with hiding it behind shader scanlines is one you need to do it for all screen resolution and frequency and shader combos, and two not everybody uses or prefers crt style shaders anyway. And even for those that do (which is me) I'm actually somewhat reluctant to use it on the oled screen for uneven wear precaution reasons.
As for the top and bottom split you refer to.. that's why I originally said my rolling scan bfi implementation looked kinda like vsync off, which I certainly didn't have it off (nor would I recommend at all for subframe use). I tested that displaying -only- any given subframe, and it displayed the correct portion of the screen. So at least for my implementation, at up to 360hz, it was correctly starting from the top, and ending at the bottom for any given real frame, and if it wasn't displaying the correct number of subframes for each 'real' frame, the emulation speed would be way off like if you set subframes for a hz that doesn't match your actual display hz.
If you just mean subframes being on in general though, outside of trying to implement rolling scan bfi, no, my normal full frame bfi implementation works with zero visual artifacts through subframes as far as I can see. And without any subframe aware shader active, subframes being active are just functioning essentially as a higher swap interval with more overhead (not in vulkan though, vulkan already used a very close equivalent of subframes to emulate swap interval as apparently vulkan doesn't support real swap interval), but with no artifacts either, for me.
Make certain that you do have vsync on in RA and at least not force disabled in your os settings, and that RA is configured to the correct refresh rate your screen is actually running at, and that the subframe setting level you chose is correct for that Hz.
Hi ok so after a trip away and a bout of COVID I managed to get time to take a look at this again.
So I think the whole rolling scanline thing - implemented GPU side is a no go. The reason: there is an optical illusion where we get a distinct tear through the middle of the screen when scrolling. This isnt the 'interference' around the split line although could be related.
A good example of what I'm talking about is Dynamite Headdy on the Megadrive first stage and look at the trees/bushes in the foreground (but it happens in all scrolling games or any movement if its fast enough). If you either implement a shader that simply displays the top half of the screen on one sub frame and the bottom half on the other OR just turn on this PR (and turn off shaders) then you can see the obvious tear as if the top half is ahead of the bottom half.
If I go into PIX for windows and capture multiple frames I can see that there is no tear and the frames are rendered how I would imagine them to be but I definitely perceive a tear.
I think this is an optical illusion and there isn't any getting around it as I cant see it in the characters - Dynamite Headdy or the big red robot that aren't moving relative to the screen and so dont have motion blur.
I think what is happening is the eye is seeing the blur split in two but I'm not sure. You can mitigate this issue with a big thick transition bar of about 20 pixels in size as the blur covers the tears but you can kind of see it transitioning still (like a gradient).
I currently think the only way to implement a rolling scanline is do it display side and have it much more like how a CRT does it with continuous scan down the screen within a frame. I dont think any attainable hz is going to do it - 1000hz etc. Possibly 240x60hz (14400hz) might i.e a single scanline for every frame at 60hz.
It'd be good to see if anybody else can repro this or has seen it themselves and this isn't some quirk of my setup - you can just use this PR or write a very simple shader (I can post one here if needs be).
Overall I think the whole shader subframe feature is a feature without any utility currently. If you cant do scanline simulation of any sort without major artifacts then what else would this feature be used for?
The only other thing that is in the vicinity of this is interlacing but that doesnt require sub frame functionality as the console outputs it via whole frames.
It might be an idea to at the very least hide the option behind developer mode or something as its just another option for end users to get confused about unless there is some utility to it.