applet.video.ws2812_output: Remove deprecated stuff
Hi!
This is me trying to get my feet wet with this project.
- Added another pixel format, nothing exciting (because I found an example of hardware that uses that specific pixel format)
- Ported to V2 API (see #826)
- Ported to not use the old multiplexer interface (I think that's part of porting to V2, but spi_controller_deprecated seemed to get away without doing it.
- Removed use of deprecated cancel on EOF functionality
Dubious stuff:
- The buffer size is now a build time argument, but it looks like it should be a setup time argument. If I made it a setup time argument, I think I would need to access a protected attribute. (That's what DeprecatedDemultiplexerInterface does)
- Is the size that the applet accepts from the socket supposed to coincide with the USB buffer size (if I understand things correctly)? I didn't change this logic - but I don't quite understand why it's the way it is.
Hi! I'm very excited to have another (potential) contributor on board, as the project desperately needs more. I am especially happy to see you porting applets to the V2 API.
- Ported to not use the old multiplexer interface (I think that's part of porting to V2, but spi_controller_deprecated seemed to get away without doing it.
It is, yeah. spi_controller_deprecated is a hack that exists to enable more gradual porting of existing applets (porting and testing every single applet that uses SPI is a bit of a gargantuan task, especially given that the new SPI applet currently doesn't support anything but Mode 3, and this is very much a problem that needs fixing). It shouldn't be taken as anything but an ugly temporary measure taken out of necessity.
(That's what DeprecatedDemultiplexerInterface does)
Same here; anything with Deprecated in it is code I'm trying to remove.
- The buffer size is now a build time argument, but it looks like it should be a setup time argument. If I made it a setup time argument, I think I would need to access a protected attribute.
I think it should be a setup time argument, yeah. I'm not following; which attribute?
I should also explain how we're using privacy in this codebase. Java-style privacy (per class) is what people usually do in Python, but it's impractical: it sometimes requires you to make implementation details public solely to satisfy the compiler. Rust-style privacy (per module) is what we're using here. That is, if your module has class A: and class B:, then a function in B can touch A()._attr. Of course, if A is intended for consumption outside of the module (as marked by its presence in __all__) then using such an attribute outside of the module shouldn't be a requirement for being able to utilise the class.
- Is the size that the applet accepts from the socket supposed to coincide with the USB buffer size (if I understand things correctly)? I didn't change this logic - but I don't quite understand why it's the way it is.
No, I don't think so. I should mention that this applet is quite old and wasn't written by me, so I'm in the same position as you are when looking at the code. From first principles:
- The USB buffer size (host-side plus device-side) reflects latency: assuming the bandwidth to be constant and other delays to be negligible (both non-trivial assumptions), it determines maximum delay between the write being executed and being read from the FPGA buffer.
- The endpoint receive buffer size is... actually, I have no idea what's going on here. I think the little loop replicates the functionality of
endpointitself? I'm pretty sure the code should look more like:
await self.ws2812_iface.write(await endpoint.read(frame_size))
and everything else is stuff that shouldn't be there at all as the queue_size argument in the constructor takes care of it already.
Do you have the ability to check if the code works well with a large panel? If not, I wonder if @marcan would be able to re-test the updated applet.
The buffer size is now a build time argument, but it looks like it should be a setup time argument. If I made it a setup time argument, I think I would need to access a protected attribute.
I think it should be a setup time argument, yeah. I'm not following; which attribute?
On main, buffer_size ends up getting passed to device.demultiplexer.claim_interface, which gets passed to DeprecatedDemultiplexerInterface.__init__, which on line 242 sets HardwareOutPipe._out_buffer_size. I didn't want to do this, so I passed it to assembly.add_out_pipe instead, which then has to happen at build time.
I'm pretty sure the code should look more like: [...]
Yup, that seems to work well
Do you have the ability to check if the code works well with a large panel?
Yes, I have two 20x20 light curtains with me. I can't chain them (I think it's the kind where all the DINs are connected in parallel) but I can hook them up in parallel (and test that function too). I can get to some performance testing in a few days.
On main, buffer_size ends up getting passed to
device.demultiplexer.claim_interface, which gets passed toDeprecatedDemultiplexerInterface.__init__, which on line 242 setsHardwareOutPipe._out_buffer_size. I didn't want to do this, so I passed it toassembly.add_out_pipeinstead, which then has to happen at build time.
Oh, so there are two different buffers in play: the one in the FPGA and the one on the host. They are not equivalent at all. The on-FPGA buffer is used to compensate for packet-to-packet latency spikes and must be sized carefully because it can never be more than 14K, and making it significantly bigger than 512 causes various issues. The host buffer can be as large as you want; its use is more about compensating for the input latency spikes.
So I think your change needs to be considered carefully: it is possible that one or the other is more appropriate depending on the task; I haven't thought a lot about the task. Is the applet handling real-time data or is it more queueing the video upfront? How bad are potential underflows? These are all important questions to ask.
The applet seems to be optimized for streaming out data (almost) as fast as the protocol allows. (800 kbps + 300 us/frame) Before e8b21b7, it was broken and it was only able to output full buffers (i. e. args.buffer frames) at a time. (I thought this was the intended behaviour.) Now another use case is doing the timing in the client program, which sounds reasonable for low latency, but I wouldn't really trust it for smooth animations. I think a framerate limiter on the FPGA would be an easy and good addition, but I just didn't want to do that yet. I think I will now.
The consequence of an underflow between frames is inconsistent frame timing, but if it happens within a frame, the frame gets split into two parts, and the second part starts getting applied over the first half of the string. Worst case scenario, flickering lights.
I think the original implementation used the USB buffer, and as does my revision. It certainly gets the same bitstream ID even with two very different values of args.buffer
(I probably have some LEDs around I could wire up in a pinch, but probably not any more useful than @dratini0's light curtains and I haven't used this code in a long while, so unsubscribing from this one)
The consequence of an underflow between frames is inconsistent frame timing, but if it happens within a frame, the frame gets split into two parts, and the second part starts getting applied over the first half of the string. Worst case scenario, flickering lights.
There are two general approaches that can be used here:
- Gateware approach where an additional FIFO is being filled, with no data being let out of it unless there's one frame-full of data in it. This is somewhat costly but 100% reliable. It is the method of choice for hard real-time applications.
- Software approach where you only ever read a frame-sized chunk of data, then write and flush it. You could still rarely get torn writes, but it should be reliable enough for almost any application. That's probably the way to go here. (This is also what's implemented already.)
Please also add documentation: every V2 applet must have, at a minimum, the CLI documentation. Just the CLI documentation (see other applets for how to wire it up) is enough for this applet, especially considering that the *Interface class doesn't seem to be very user friendly and lacks any docstrings.
- I have added the documentation
- I have done some performance testing, and I can't seem to break it (with 2 * 400 LEDs), but the flush seems to be necessary. Otherwise, the buffer doesn't fully drain by itself. So, if I stop transmitting data, the last few frames just sit there until I start transmitting again. I would have expected the it to try transmitting until it's empty. I re-added the
_wait=Falseanyway. - I still want to clean up the Interface API, I think I will turn the pixel formats into a constant dict of
namedtuples, so a consumer can actually access it. - Do you also want me to write some real tests?
- So, if I stop transmitting data, the last few frames just sit there until I start transmitting again. I would have expected the it to try transmitting until it's empty. I re-added the
_wait=Falseanyway.
Well, I want to get rid of _wait so I'd rather not add more instances of it. We need to figure out an alternative solution.
I still want to clean up the Interface API, I think I will turn the pixel formats into a constant dict of namedtuples, so a consumer can actually access it.
Please don't proliferate namedtuples, that is an ancient and awful method of constructing data structures. Use dataclasses.
Do you also want me to write some real tests?
Sure
Well, I want to get rid of
_waitso I'd rather not add more instances of it. We need to figure out an alternative solution.
I will test tonight what happens if this applet relinquishes all control of the out pipe's buffer (i. e. no flush, no specifying buffer size). That would have the beneficial side-effect of making args.buffer a runtime parameter, as well as making it possible to make args.count a setup time variable.
Please don't proliferate
namedtuples, that is an ancient and awful method of constructing data structures. Usedataclasses.
Sorry, I keep forgetting frozen dataclasses exist.
Do you also want me to write some real tests?
Sure
Will do.
Ok, not great. Things I have tried so far:
flush(_wait=True): Writes to the buffer, and then waits for that buffer to empty. The buffer is effectively turned off.- Removing
flush()altogether:_out_packet_size * _packets_per_xferends up>= _out_buffer_size, so_out_threshold == _out_buffer_size. (That's 800 LEDs buffered 16 times, so I don't expect many people to have more. It ends up happily sitting on that data until the buffer is full, at which point it will output exactly one frame. And then it's back to sitting on data. I get the usual latency/buffer stability tradeoff, but this looks more like a delay line to me than a buffer. - Removing the buffer limit and the flush call. The result is an unbounded buffer. It exerts no backpressure, so it quickly climbed to a few gigabytes for me. Not useful in this case.
Specific details about this applet aside, I think that we should consider changing the scheduling logic in HardwareOutPipe to submit transfers if there are none pending.
For an example of where this behavior change would be useful, consider an applet that forwards commands to the Glasgow, where the command stream may be very fast or very slow. Also, the applet doesn't know the throughput in advance. If the command stream were slow, there'd be significantly more latency without explicitly flushing, and if the command stream were fast, explicitly flushing after every command would reduce the throughput. As an alternative, the applet could implement logic for flushing, but I think that eagerly submitting transfers when there are no pending transfers would be a better solution.
Specific details about this applet aside, I think that we should consider changing the scheduling logic in
HardwareOutPipeto submit transfers if there are none pending.
Yes, I agree. I will find some time to think about it in the next few days.
Ok, I think I am just about ready to give up on writing tests. Apparently the tests run at 1MHz, and I'm not sure I can override that. This poses problems with a serial protocol running at a fixed 800 kHz.
One more thing I'm wondering about is all of my classes being prefixed by the taxon, e. g. VideoWS2812OutputInterface. Some applets do that, some don't. Should I remove that?
Finally, what do you want to do about merging this? Wait for you to resolve the scheduling situation, I remove flush, retest, and then it can be considered for merge again?
Ok, I think I am just about ready to give up on writing tests. Apparently the tests run at 1MHz, and I'm not sure I can override that. This poses problems with a serial protocol running at a fixed 800 kHz.
Apologies! The test fixture is quite immature and writing tests isn't anywhere nearly as pleasant as I'd like it to be.
There is IIRC currently no override and we could just add one. In the past, I would simply lower the simulation step, which makes all tests slower, which is why I stopped at 1 MHz (or lowered it at all).
One more thing I'm wondering about is all of my classes being prefixed by the taxon, e. g.
VideoWS2812OutputInterface. Some applets do that, some don't. Should I remove that?
There is no fully consistent rule for this.
- Public classes &c are generally prefixed in this manner unless it results in a nonsense like
VideoVideoSomething. - Private classes &c are generally named whatever seems the most appropriate.
I'd say: use your judgement, try to think of the downstream uses and whether they'd be confusing. We have a mix of "this class is supposed to be imported on its own" and "this class is namespaced by its module name" which is unfortunate but I don't know what to do with in the short term.
Finally, what do you want to do about merging this? Wait for you to resolve the scheduling situation, I remove flush, retest, and then it can be considered for merge again?
That sounds reasonable. I'm currently dealing with an extremely disruptive personal event, so my availability will be inconsistent for at least a month. It may end up with me spending either more time on Glasgow or less time on Glasgow depending entirely on my health. I was planning for it to be "more" but it is not be entirely up to me.
There is IIRC currently no override and we could just add one. In the past, I would simply lower the simulation step, which makes all tests slower, which is why I stopped at 1 MHz (or lowered it at all).
Hmm, if I'm touching anything outside this applet, I think I will want to open another PR for that.
I'd say: use your judgement, try to think of the downstream uses and whether they'd be confusing. We have a mix of "this class is supposed to be imported on its own" and "this class is namespaced by its module name" which is unfortunate but I don't know what to do with in the short term.
I think I will just stop trying to overthink it and leave it as-is.
That sounds reasonable. I'm currently dealing with an extremely disruptive personal event, so my availability will be inconsistent for at least a month. It may end up with me spending either more time on Glasgow or less time on Glasgow depending entirely on my health. I was planning for it to be "more" but it is not be entirely up to me.
I hope you get better soon!
Hmm, if I'm touching anything outside this applet, I think I will want to open another PR for that.
Yes, this is preferred.