Byte transmission problem when using big buffer
I am transmitting a file over a serial line. I read the file like this with an Elixir task:
alias NervesUartEvaluation.Serial
def read(file) do
serial = Serial.setup_serial(:readDeviceName)
read_loop(serial, file)
end
def read_loop(serial, file) do
case Nerves.UART.read(serial, 5) do
{:ok, ""} -> read_loop(serial, file)
{:ok, content} ->
IO.puts(byte_size(content))
IO.binwrite(file, content)
read_loop(serial, file)
fail -> IO.puts fail
end
end
and read the file with another task:
@buff_size 4098
alias NervesUartEvaluation.Serial
alias Nerves.UART
def write(file) do
serial = Serial.setup_serial(:writeDeviceName)
write_loop(serial, file)
end
def write_loop(serial, file) do
case IO.binread(file, @buff_size) do
:eof -> UART.flush(serial); IO.puts "done"
{:error, reason} -> IO.puts("Error" <> reason)
content ->
IO.puts(byte_size(content))
UART.write(serial, content)
UART.drain(serial)
write_loop(serial, file)
end
end
I have created the file with dd: dd if=/dev/urandom of=./input.bin bs=1024 count=683.
When setting buff_size to 4098 there are errors when transmitting the file:
cmp input.bin output.bin
input.bin output.bin differ: byte 4096, line 19
whereas a value of 4000 or 2048, or 1000 works. The same byte is always the incorrectly transmitted.
Is there a bug somewhere in the library or am I doing something wrong?
The read and write paths in the C code are coupled, so even though you have separate processes in Elixir a big write call will delay reading bytes from the OS's internal buffers. Based on your experiment, I would assume that the OS's internal buffer is 4096 bytes and when you write more than that, the OS drops the additional bytes. If the C code were actively removing bytes from the serial port while the big write was happening, this wouldn't happen.
Interestingly enough, I had a note about this coupling in the C implementation, but I had thought that it only would affect performance and since serial ports generally only operate at very slow speeds, I didn't worry about it. This is an interesting consequence of running two UARTs and looping them back on each other that I hadn't considered. I hadn't run into this use case in my own work.
This issue could certainly be fixed. It's a little tricky, though, and I don't have time at the moment to do it. I'm really glad that you pointed this out, since I bet others may run into it and it feels more legit now to spend time decoupling the read and write paths in the C code.
Thank you very much for your detail answer.
Is there one buffer per physical device? or one per device in /dev? I am writing on one device in /dev and reading on another. The physical is binding two ports of the machine.
Does that mean that if I run the process on two machines I will not have the problem?
Hmm. Now I'm less sure. I was thinking that you were reading and writing to one device and the receive wire was connected to the transmit wire. If you have two nerves_uart GenServers running for two different devices then that messes up my theory that the problem was in the nerves_uart C port implementation.
Are you running on Linux?
Also, have you tried removing the call to UART.drain and just letting UART.write block when it has to? I have a vague recollection of a serial driver where that call had a side effect of coupling the rx and tx paths, but that was a long time ago and certainly serial device-specific.
As for two machines, I would absolutely hope that you wouldn't see this problem on two machines. If you did, then that pretty squarely points to the transmit side having a limit of sending 4K at a time. I don't see how that could be nerves_uart, but I guess that it would be something to investigate.
I am running on Linux. Removing the call to drain causes the data to be corrupted.