libusb-win32 icon indicating copy to clipboard operation
libusb-win32 copied to clipboard

USB 2.0 : receiving "512 then ZLP" blocks for ever when trying to read N x 512 (with N>1)

Open sonatique opened this issue 4 months ago • 19 comments

Hello @dontech ,

EDIT: I don't know whether you read this message already, but some information about libusb0.sys were incorrectly written. Now it's corrected.

I am facing an issue that seems surreal, using latest libusb , libusbK and libusb0.sys .

By carefully adding logs in all these 3 places, and simplifying my scenario, I am now pretty convinced that the issue is in libusb0.sys.

In short: when trying to receive 512 bytes + ZLP in a buffer of 1024, the callback transfer is never called and everything freezes. Everything works for 512 bytes + 4 bytes.

Here is my scenario:

I am first submitting a read transfer for 1024 bytes. libusb0.sys (compiled in Debug) outputs this is DebugView: libusb0-sys:[transfer] [bulk-read #206] EP86h length=1024, packetSize=512, maxTransferSize=4194304 -> OK.

Then my device do something (I double-checked this using a USB sniffer): it first sends a 512 bytes IN transfer, then a IN ZLP to signal that there are no more data.

I see something regarding reception that first looks good: libusb0-sys:[transfer_complete] sequence 206: 512 bytes transmitted, maximum_packet_size=512, totalLength=1024

But then: nothing. The completion routine in libusb0.sys is never called again and the libusb transfer callback is never called. I am expecting the ZLP to signal the end and the user callback to be called. And really: I am 100% sure the ZLP went over the wire.

I had a look a the code. I am seeing:

if(NT_SUCCESS(irp->IoStatus.Status)
		&& USBD_SUCCESS(c->urb->UrbHeader.Status)
		&& (c->urb->UrbHeader.Function == URB_FUNCTION_BULK_OR_INTERRUPT_TRANSFER)
		&& !(transmitted % c->maximum_packet_size)
		&& c->totalLength)

in transfer_complete implementation. The body of the "if" is basically to resubmit in Irp for the expected remainder (512 bytes in this case). Transmitted == 512 in my case, maximum_packet_size == 512 and totalLength == 512 as well because the value has been update a few lines above with 1024 - 512.

From what I understand, the condition is written such that the Irp resubmission will occur when totalLength (i.e. the number of bytes still to be received ideally) is > 0 and when the received byte count (transmitted) is a multiple of the max packet size. a) > 0 and not a multiple of max packet size : this was including a short packet reception, no need to resubmit since no more data b) == max packet size: we shall resubmit in case there are more data c) == 0 : this is maybe a problem because the condition will be true as well since (0 % 512) == 0, as if "transmitted" was a multiple of max packet size. This means that the code will wrongly resubmit a transfer for >0 bytes, when the bus just told us there are no more data.

At first I thought this was the reason for the issue: first, transfer_complete would do the job correctly, then would be called again with transmitted == 0, would submit another Irp and would wait for something to happen in vain (since the device doesn't send anything anymore).

BUT: I realized that transfer_complete is not called a second time. And therefore I think you knew transfer_complete would never be called for transmitted == 0, such that you knew that writing !(transmitted % c->maximum_packet_size) would be a correct not-short packet detection condition given transmitted is constrained to >0 and <= totalLength.

But then I am puzzled: if transfer_complete is never called for ZLP: how could the code unwind when it receives a ZLP for signaling the end then calls the user callback with just 512 bytes of received data?

I think there is an issue, but I so far I could not figure it out. Does the code just not support getting a number of bytes equal to maximum_packet_size then a ZLP?

Note that everything works as expected when my device sends 512 bytes then 4 bytes for instance: transfer_complete is called once for 516 avoid resubmitting an Irp and everything terminate as expected.

Could you help me figuring out what is happening? Thanks a lot!

sonatique avatar Aug 15 '25 16:08 sonatique

@dontech : I think that since one single transfer_complete can receive more than max packet size bytes at once, there is probably no reliable way to understand whether to resubmit or not. I think there shall be no resubmission unless we received exactly maxTransferSize bytes and totalLength > maxTransferSize.

I need to think a little bit more about this, then I'll probably propose a PR.

I made some experiments with submitting read with a buffer of length 16384 and noticed that in my case I always receive a first tranfer_complete with transmitted == 8192 (when my device sends more than 8K), I guess it depends on various internal details and timing, but say it happens. So, let's consider a few cases:

With current code:

  1. Device sends a total of 10'000 bytes: We first receive a transfer of 8192. 8192 is a multiple of 512 so we'll ask for the remaining 8192 bytes by resubmitting. Next transfer will get the 1808 bytes. This is not a multiple of 512, so there will be not resubmit. Result: all 10'000 bytes are received at once from the higher level code

  2. Devices sends a total of 16384: We first receive a transfer of 8192. 8192 is a multiple of 512 so we'll ask for the remaining 8192 bytes by resubmitting. Next transfer will get 8192 bytes. This is a multiple of 512 but we reached totalLength, so there will be not resubmit. Result: all 16'384 bytes are received at once from the higher level code:

In both cases all data are received at once which is a good optimization and I guess the reason for this code to be like this.

  1. Devices sends a total of 8192 + 1024 = 9216: We first receive a transfer of 8192. 8192 is a multiple of 512 so we'll ask for the remaining 8192 bytes by resubmitting. Next transfer will get the 1024 bytes. This is still a multiple of 512, and we did not reach 16384 (we're at 9216) so we'll re-submit again. Result: there will never be another transfer and we're stuck. In practice I guess we're only stuck until the next data either creates a non multiple of 512 transfer or complete the buffer up to totalLength, so that's probably why this issue didn't get noticed earlier. But for devices that sends some data, then stops or pause for a long time, I think it is a real issue.

What I will later propose in a PR is that we never resubmit unless we receive maxTransferSize and totalLength > maxTransferSize. Which probably means we'll resubmit very infrequently. For the above cases with my changes we'll get:

  1. One higher level reception of 8192. Call we'll have to resubmit a read to get the remaining 1808. This is probably less optimal but can be efficiently workarounded by submitting more than one transfer at once, such that there will be virtually no pause before getting the 1808 bytes. And anyway there is no guarantee that a single read call will get all the data (as long as "all the data" makes sense) so user code most probably already have at least reads in a while loop.

  2. Same: higher level will have to get the data in multiple read.

  3. We'll get 2 transfers again: one with 8192 one with 1024. We won't get stuck.

If we are in the case totalLength > maxTransferSize (and say totalLength < 2*maxTransferSize for the example) and we receive maxTransferSize: we'll resubmit once only and get more thant maxTransferSize at once.

What do you think? Does all this make sense to you as well?

Thank you very much in advance for considering this issue.

sonatique avatar Aug 15 '25 18:08 sonatique

Hello again @dontech, Here are some additional thoughts about what I wrote above. I think that the way it is coded now (the fact that the code internally resubmit as many transfer as possible) is not only for performance reason. It is to implement the concept of "USB Transfer" which is very important and user code shall be able to rely upon it.

The problem is that current implementation doesn't work when transfer size is a multiple of max packet size: it will either get stuck for ever if this transfer was the last, or will aggregate some data from the next transfer into the previous. This is the bug I identified.

The fix I initially intended to submit as a PR is not good neither: it breaks the concept of transfer, pushing the responsibility of handling it to the user code, which is not correct conceptually but more importantly which cannot be implemented for the same reason it is broken at the driver level.

So the fix will be complex to implement, I am afraid. It seems that we are bound to how the Windows API is implemented. To start with I have no idea where the 8192 "block size" I observe is coming from and I have no idea how we can request to get the information about ZLP.

sonatique avatar Aug 18 '25 08:08 sonatique

Hi Sonatique!

I will try to find some time to look at this. There are 2 key problems here:

  1. We need to fix the hang with "512 bytes + ZLP in a buffer of 1024". This is uacceptable behavior and I will fix this ASAP. Must be a logical problem somewhere in the library or the driver. This should never happen.

  2. The 8192 limit is artificial and i think its a limit in the driver. I am fairly sure the Windows USB subsystem supports higher sizes.

I will try to process this as soon as possible, as your input seems very clear. Just need to find my super-speed dev board. Now where did i put that.....

Thanks,

/pedro

dontech avatar Aug 18 '25 12:08 dontech

@dontech : thank you very much for your reply and for considering my issue as important.

On my side I am still struggling to find a correct fix implementation: my knowledge of the API you use in this code is quite limited, but even after reading a lot of web pages, I still don't understand how the code could figure out whether or not a ZLP was received.

I mean: at some point I briefly considered that I may get the 8192 value in advance or at worst guess it from within transfer_complete, and the use it to say "OK if I got 8192 byte when I provided a buffer of 16384. It is caused by some segmentation due to whatever. I will resubmit for getting the up-to 8192 remaining bytes". But no, it won't work when the device exactly send 8192 + ZLP.

So: yes, thank you for considering my issue again. The more I work on it the more serious it appears to me...

sonatique avatar Aug 18 '25 13:08 sonatique

OK i have reproduced the problem.

Here is the breakdown:

  1. There is no way to detect ZLP in usbd.sys API.

  2. The only way to "detect" that a transfer could be the result of ZLP is to extend the check to see if transfer is smaller than maxTransferSize, as this is the official way the driver below signals ZLP.

  3. Only catch here is that if transfer is cut short because of anything else, the transfer is stopped here. That is not fatal, as the application above is told the transferred size anyway.

I will submit a change for this, as the current behavior is unacceptable.

dontech avatar Aug 21 '25 23:08 dontech

maxTransferSize is calculated according to these rules: https://learn.microsoft.com/en-us/windows-hardware/drivers/usbcon/usb-bandwidth-allocation

dontech avatar Aug 21 '25 23:08 dontech

ZLP fixed with: https://github.com/mcuee/libusb-win32/commit/aa869d72dc60171a802c36c34458583ee9c752cc

Please review and approve, and i will spin 1.4.0.1

dontech avatar Aug 21 '25 23:08 dontech

The other problem with 8192 "block size" is not reproducible here.

Also according to your post:

libusb0-sys:[transfer] [bulk-read #206] EP86h length=1024, packetSize=512, maxTransferSize=4194304

maxTransferSize = 4MB, which is much higher than 8192. Are you sure the 8192 limit is not in your device?

If not could you send Debug view logs and USB trace logs?

Thanks,

/pedro

dontech avatar Aug 21 '25 23:08 dontech

ZLP fixed with: aa869d7

Please review and approve, and i will spin 1.4.0.1

Thanks a lot!

This look like the correct fix when assuming that there is no other "split" mechanism than max packet size and max transfer size, i.e. if the 8192 comes from my devices, which is possible, I need to double-check that.

sonatique avatar Aug 22 '25 08:08 sonatique

The other problem with 8192 "block size" is not reproducible here.

Yes so: sorry: Indeed the 8192 limit comes from my device (because of hoe the FX3 is configured). I didn't realize this because I somehow got the information it was not the case from a trustable source, but this was wrong.

I can see using an USB sniffer that in the case of the 9216 transfer I described above, my device sends: 8192 bytes (with the corresponding number of 512 byte packets) ZLP 1024 (2x 512 packets) ZLP

With your fix aa869d7 everything runs as expected, no blocking, the host application first Read call receives 8192, the next one receives 1024, perfect.

My devices is therefore not capable of doing logical transfer larger than 8192, I need mechanism outside of USB to achieve this, and this is not the business of libusb0.sys.

So great: the fixed 1.4.0.1 will work like a charm, thanks a lot!

I am still wondering why this bug have stayed unnoticed for so long, I am puzzled because once you see it, it seems quite serious.

sonatique avatar Aug 22 '25 09:08 sonatique

Hi S,

I am still wondering why this bug have stayed unnoticed for so long, I am puzzled because once you see it, it seems quite serious.

Hehe, i guess this follows what i call the "lifecycle of a bug" as i call it. First it's "why does this not work", then its "oh, this does not work", and then at last "how did this ever work?".....

I tested the previous release vigorously, and use this project at 100s of clients but still did not notice this. Amazing..... :-)

But i guess the real reason that most use cases are pre-deterministic. The optimization in the driver that it can chain read/writes on a kernel level is also a bit extreme, but gives a bit extra in some cases.

Thanks,

/pedro

dontech avatar Aug 22 '25 17:08 dontech

I will spin a new release this weekend.

dontech avatar Aug 22 '25 17:08 dontech

Hehe, i guess this follows what i call the "lifecycle of a bug" as i call it. First it's "why does this not work", then its "oh, this does not work", and then at last "how did this ever work?".....

Yes ;-) also known as "The Six Stages of Debugging": https://mwcremer.blogspot.com/2007/06/six-stages-of-debugging.html?m=1 ;-)

The optimization in the driver that it can chain read/writes on a kernel level is also a bit extreme, but gives a bit extra in some cases.

I had a second thought and I am no longer sure that these are just pretty optimizations. I think with this you implemented the very concept of USB transfer, which is quite important and this is how the host program is expecting things to work, as far as I know.

Thank you for your reactivity and efficiency!

sonatique avatar Aug 22 '25 22:08 sonatique

I will spin a new release this weekend.

Cool! Quick question, I saw https://github.com/mcuee/libusb-win32/issues/78 : are you going to release 1.4.0.1 with your "old" toolchain such that the binary will support the same OS and CPU as before?

sonatique avatar Aug 22 '25 22:08 sonatique

Cool! Quick question, I saw #78 : are you going to release 1.4.0.1 with your "old" toolchain such that the binary will support the same OS and CPU as before?

Yes, and that will probably be the last version to support x86. We cannot realistically maintain the package using a secret download package. Only few have that downloaded, before Microsoft decided to pull all EWDK downloads.

dontech avatar Aug 25 '25 08:08 dontech

This is pending, as i do not have access to my signing token right now. Soon!

Let me know if you need an unsigned build.

dontech avatar Aug 26 '25 09:08 dontech

Let me know if you need an unsigned build.

I would be delighted, yes please!

sonatique avatar Aug 26 '25 12:08 sonatique

OK, will create this for you, and then i will re-post the release signed when i get back to my signing token.

dontech avatar Aug 27 '25 13:08 dontech

Here ya go:

https://github.com/mcuee/libusb-win32/releases/tag/snapshot_1.4.0.1

dontech avatar Aug 28 '25 11:08 dontech