circuits_uart Byte transmission problem when using big buffer

I am transmitting a file over a serial line. I read the file like this with an Elixir task:

  alias NervesUartEvaluation.Serial

  def read(file) do
    serial = Serial.setup_serial(:readDeviceName)
    read_loop(serial, file)
  end

  def read_loop(serial, file) do
    case Nerves.UART.read(serial, 5) do
      {:ok, ""} -> read_loop(serial, file)
      {:ok, content} ->
        IO.puts(byte_size(content))
        IO.binwrite(file, content)
        read_loop(serial, file)
      fail -> IO.puts fail
    end
  end

and read the file with another task:

  @buff_size 4098

  alias NervesUartEvaluation.Serial
  alias Nerves.UART

  def write(file) do
    serial = Serial.setup_serial(:writeDeviceName)
    write_loop(serial, file)
  end

  def write_loop(serial, file) do
    case IO.binread(file, @buff_size) do
      :eof -> UART.flush(serial); IO.puts "done"
      {:error, reason} -> IO.puts("Error" <> reason)
      content ->
        IO.puts(byte_size(content))
        UART.write(serial, content)
        UART.drain(serial)
        write_loop(serial, file)
    end
  end

I have created the file with dd: dd if=/dev/urandom of=./input.bin bs=1024 count=683. When setting buff_size to 4098 there are errors when transmitting the file:

cmp input.bin output.bin 
input.bin output.bin differ: byte 4096, line 19

whereas a value of 4000 or 2048, or 1000 works. The same byte is always the incorrectly transmitted.

Is there a bug somewhere in the library or am I doing something wrong?

Nov 13 '17 15:11 pallix

The read and write paths in the C code are coupled, so even though you have separate processes in Elixir a big write call will delay reading bytes from the OS's internal buffers. Based on your experiment, I would assume that the OS's internal buffer is 4096 bytes and when you write more than that, the OS drops the additional bytes. If the C code were actively removing bytes from the serial port while the big write was happening, this wouldn't happen.

Interestingly enough, I had a note about this coupling in the C implementation, but I had thought that it only would affect performance and since serial ports generally only operate at very slow speeds, I didn't worry about it. This is an interesting consequence of running two UARTs and looping them back on each other that I hadn't considered. I hadn't run into this use case in my own work.

This issue could certainly be fixed. It's a little tricky, though, and I don't have time at the moment to do it. I'm really glad that you pointed this out, since I bet others may run into it and it feels more legit now to spend time decoupling the read and write paths in the C code.

Nov 14 '17 13:11 fhunleth

Thank you very much for your detail answer.

Is there one buffer per physical device? or one per device in /dev? I am writing on one device in /dev and reading on another. The physical is binding two ports of the machine.

Does that mean that if I run the process on two machines I will not have the problem?

Nov 14 '17 14:11 pallix

Hmm. Now I'm less sure. I was thinking that you were reading and writing to one device and the receive wire was connected to the transmit wire. If you have two nerves_uart GenServers running for two different devices then that messes up my theory that the problem was in the nerves_uart C port implementation.

Are you running on Linux?

Also, have you tried removing the call to UART.drain and just letting UART.write block when it has to? I have a vague recollection of a serial driver where that call had a side effect of coupling the rx and tx paths, but that was a long time ago and certainly serial device-specific.

As for two machines, I would absolutely hope that you wouldn't see this problem on two machines. If you did, then that pretty squarely points to the transmit side having a limit of sending 4K at a time. I don't see how that could be nerves_uart, but I guess that it would be something to investigate.

Nov 14 '17 14:11 fhunleth

I am running on Linux. Removing the call to drain causes the data to be corrupted.

Nov 14 '17 14:11 pallix