dw1000
dw1000 copied to clipboard
TX data message failed: -22
Hi,
I have cross-compiled your driver for a Yocto-linux embedded device (with no errors). However, when trying to send frames, I get an error (-22). Do you have an idea from where this could come ?
Bellow more details on the procedure and error :
- I load the cross-compiled DW1000 module
- Enable wpan0 interface :
ip link set wpan0 down ip link set wpan0 address 76:dd:3c:55:1c:43:9a:aa iwpan phy0 set channel 4 2 iwpan dev wpan0 set pan_id 0xCAFE iwpan dev wpan0 set short_addr 0x0001 ip link set wpan0 up iwpan dev
- Send frames (ping) :
wpan-ping -a 11:22:33:44:55:66 -i wpan0
- No frames are sent or received by my other device, and here is the result of dmesg :
[ 38.952695] dw1000_mod: loading out-of-tree module taints kernel. [ 38.958584] dw1000 spi0.0: device does not support GPIO RESET control [ 38.958643] dw1000 spi0.0: device does not support GPIO POWER control [ 38.977691] Initialization [ 38.977917] dw1000 spi0.0: found model 1.3.0 [ 38.981413] dw1000 spi0.0: set xtal trim 0x0f [ 38.984084] dw1000 spi0.0: set LDOTUNE 08:88:88:88:88 [ 38.986927] dw1000 spi0.0: has no permanent EUI-64 address [ 38.988047] dw1000 spi0.0: set antenna delays 0x0000,0x0000 [ 38.992475] dw1000 spi0.0: I read dev_id : 0xdeca [ 51.479224] dw1000 spi0.0: set channel 2 [ 51.501790] dw1000 spi0.0: promiscuous mode disabled [ 51.501989] dw1000 spi0.0: set PAN ID cafe [ 51.502180] dw1000 spi0.0: set EUI64 76dd3c551c439aaa [ 51.502366] dw1000 spi0.0: set short address 0001 [ 51.503086] dw1000 spi0.0: started [ 58.799961] random: crng init done [ 67.537920] dw1000 spi0.0: TX data message failed: -22 [ 68.046570] dw1000 spi0.0: TX data message failed: -22 [ 68.556604] dw1000 spi0.0: TX data message failed: -22 [ 69.066571] dw1000 spi0.0: TX data message failed: -22 [ 69.576597] dw1000 spi0.0: TX data message failed: -22 ...
Thank you and have a good day
On 13/11/2018 13:39, jonathanmuller wrote:
No frames are sent or received by my other device, and here is the result of dmesg :
[ 38.952695] dw1000_mod: loading out-of-tree module taints kernel. [ 38.958584] dw1000 spi0.0: device does not support GPIO RESET control [ 38.958643] dw1000 spi0.0: device does not support GPIO POWER control [ 38.977691] Initialization [ 38.977917] dw1000 spi0.0: found model 1.3.0 [ 38.981413] dw1000 spi0.0: set xtal trim 0x0f [ 38.984084] dw1000 spi0.0: set LDOTUNE 08:88:88:88:88 [ 38.986927] dw1000 spi0.0: has no permanent EUI-64 address [ 38.988047] dw1000 spi0.0: set antenna delays 0x0000,0x0000 [ 38.992475] dw1000 spi0.0: I read dev_id : 0xdeca [ 51.479224] dw1000 spi0.0: set channel 2 [ 51.501790] dw1000 spi0.0: promiscuous mode disabled [ 51.501989] dw1000 spi0.0: set PAN ID cafe [ 51.502180] dw1000 spi0.0: set EUI64 76dd3c551c439aaa [ 51.502366] dw1000 spi0.0: set short address 0001 [ 51.503086] dw1000 spi0.0: started [ 58.799961] random: crng init done [ 67.537920] dw1000 spi0.0: TX data message failed: -22 [ 68.046570] dw1000 spi0.0: TX data message failed: -22 [ 68.556604] dw1000 spi0.0: TX data message failed: -22 [ 69.066571] dw1000 spi0.0: TX data message failed: -22 [ 69.576597] dw1000 spi0.0: TX data message failed: -22 ...
I haven't seen that before. The only possibility I can think of is that wpan-ping is sending a zero-length packet and the SPI controller is complaining about being asked to transmit a zero-length data fragment.
Does the error go away if you specify a packet length (using the "-s" option for wpan-ping)?
Michael
I used different length with wpan-ping, all with the same result. Also tried raw socket with no more success.
I verified if spi was working at all, and basic commands (reading/setting pan_id for example) works fine. This is really strange.
I'll keep you updated if I find anything related to this.
Which kernel module is providing the SPI controller driver in your setup?
I'm not exactly sure which exact information you are expecting, I hope it is one of those :
The kernel used is the Linux 4.11 Fearless Coyote, driver spi-imx.
The driver name is "spi_imx" (spi-imx.c) It's compatible fields are
{ .compatible = "fsl,imx1-cspi", .data = &imx1_cspi_devtype_data, }, { .compatible = "fsl,imx21-cspi", .data = &imx21_cspi_devtype_data, }, { .compatible = "fsl,imx27-cspi", .data = &imx27_cspi_devtype_data, }, { .compatible = "fsl,imx31-cspi", .data = &imx31_cspi_devtype_data, }, { .compatible = "fsl,imx35-cspi", .data = &imx35_cspi_devtype_data, }, { .compatible = "fsl,imx51-ecspi", .data = &imx51_ecspi_devtype_data, }, { .compatible = "fsl,imx53-ecspi", .data = &imx53_ecspi_devtype_data, },
File avaible here : https://github.com/torvalds/linux/blob/v4.11/drivers/spi/spi-imx.c
It should take only an hour or so to narrow down the root cause using ftrace (or by adding printk() statements to the spi-imx.c driver), given access to the hardware. There aren't that many code paths in spi-imx.c that can generate EINVAL.
The moment it fails is in "dw1000_tx()" when calling "spi_async()" Exact line is : "
if ((rc = spi_async(dw->spi, &tx->data)) != 0) goto err_spi;
Right before the call to spi_async() :
- tx->data.frame_length = 0
- tx->data.actual_length = 0
- tx->data.status = 0
- tx->tx_buffer.data.len = 14
- tx->len = 16
Right after the call to spi_async():
- tx->data.frame_length = 23
- tx->data.actual_length = 0
- tx->data.status = -115 // EINPROGRESS 115 Operation now in progress
- tx->tx_buffer.data.len = 14
- tx->len = 16
Right after spin_unlock_irqrestore():
- tx->data.frame_length = 23
- tx->data.actual_length = 0
- tx->data.status = -22 // EINVAL 22 Invalid argument
- tx->tx_buffer.data.len = 14
- tx->len = 16
I also verified and the buffer correspond to what I'm sending from my application (wpan-ping / raw socket) with the right len and content.
But dw1000_tx_data_complete() is called before spin_unlock_irqrestore(), and the check catches the same error -22 :
if (unlikely(tx->data.status != 0)) {
dev_err(dw->dev, "Write to SPI failed. Error: %s\n", strerror(errno));
goto err_status;
}
I also verified and both dw->spi and dw->dev are not null. What surprises me is that they are the same (same pointer value), is it expected ?
So, in "spi_async(dw->spi, &tx->data))", either "dw->spi" is incorrect, or "&tx->data" is.
Still investigating, I'm open for suggestions.
On 20/11/2018 09:53, jonathanmuller wrote:
But dw1000_tx_data_complete() is called before spin_unlock_irqrestore(), and the check catches the same error -22 :
|if (unlikely(tx->data.status != 0)) { dev_err(dw->dev, "Write to SPI failed. Error: %s\n", strerror(errno)); goto err_status; } |
I also verified and both dw->spi and dw->dev are not null. What surprises me is that they are the same (same pointer value), is it expected ?
Yes, that's expected. dw->dev is set to &dw->spi->dev. Since "dev" is the first field within struct spi_device, they will end up with the same pointer value.
Still investigating, I'm open for suggestions.
I would suggest either:
-
use ftrace to follow the thread of execution within spi-imx.c, or
-
build spi-imx.ko as an out-of-tree kernel module, and add debug prints to figure out where it fails and generates the EINVAL.
Michael
I followed the path up the the EINVAL and it originates from "spi_map_buf()" in spi.c
The responsible code is :
#ifdef CONFIG_HAS_DMA
static int spi_map_buf(struct spi_controller *ctlr, struct device *dev,
struct sg_table *sgt, void *buf, size_t len,
enum dma_data_direction dir)
{
const bool vmalloced_buf = is_vmalloc_addr(buf);
printk("Tried to alloc gave %d\n", vmalloced_buf);
unsigned int max_seg_size = dma_get_max_seg_size(dev);
#ifdef CONFIG_HIGHMEM
const bool kmap_buf = ((unsigned long)buf >= PKMAP_BASE &&
(unsigned long)buf < (PKMAP_BASE +
(LAST_PKMAP * PAGE_SIZE)));
#else
printk("Set kmap_buf to false\n");
const bool kmap_buf = false;
#endif
int desc_len;
int sgs;
struct page *vm_page;
struct scatterlist *sg;
void *sg_buf;
size_t min;
int i, ret;
if (vmalloced_buf || kmap_buf) {
desc_len = min_t(int, max_seg_size, PAGE_SIZE);
sgs = DIV_ROUND_UP(len + offset_in_page(buf), desc_len);
} else if (virt_addr_valid(buf)) {
desc_len = min_t(int, max_seg_size, ctlr->max_dma_len);
sgs = DIV_ROUND_UP(len, desc_len);
} else {
return -EINVAL;
}
vmalloced_buf is 0 kmap_buf is 0 virt_addr_valid(buf) is 0
When checking the kmap_buf with the following code (which only expand the test :
const bool kmap_buf = ((unsigned long)buf >= PKMAP_BASE && (unsigned long)buf < (PKMAP_BASE + (LAST_PKMAP * PAGE_SIZE)))
)
printk("kmap_buf was set to sth : %d because %lu >= %lu\n",
kmap_buf, (unsigned long)buf, PKMAP_BASE && (unsigned long)buf < (PKMAP_BASE +(LAST_PKMAP * PAGE_SIZE)) );
printk("Individual : %lu %lu %lu %lu %lu\n", PKMAP_BASE, (unsigned long)buf, PKMAP_BASE, LAST_PKMAP, PAGE_SIZE);
printk("Step 1 %lu\n", LAST_PKMAP);
printk("Step 2 %lu\n", (LAST_PKMAP * PAGE_SIZE) );
printk("Step 3 %lu\n", (PKMAP_BASE +(LAST_PKMAP * PAGE_SIZE)) );
printk("Step 4 %lu\n", (unsigned long)buf < (PKMAP_BASE +(LAST_PKMAP * PAGE_SIZE)) );
printk("Step 5 %lu\n", PKMAP_BASE && (unsigned long)buf < (PKMAP_BASE +(LAST_PKMAP * PAGE_SIZE)) );
I get :
[ 42.564013] kmap_buf was set to sth : 0 because 3990826368 >= 0
[ 42.564018] Individual : 3219128320 3990826368 3219128320 512 4096
[ 42.564021] Step 1 512
[ 42.564028] Step 2 2097152
[ 42.564031] Step 3 3221225472
[ 42.564034] Step 4 0
[ 42.564037] Step 5 0
This is the reason why kmap_buf=0, but I suspect it should not
On 20/11/2018 15:30, jonathanmuller wrote:
vmalloced_buf is 0 kmap_buf is 0 virt_addr_valid(buf) is 0
The TX data messages comprises multiple SPI transfer sets, each of which has a header transfer and a data transfer. Which of these transfers is triggering the failure?
Something like this (completely untested) in dw1000_tx():
/* Initiate data SPI message */
{
struct spi_transfer *xfer;
list_for_each_entry(xfer, &tx->data.transfers,
transfer_list) {
printk("tx_buf %p rx_buf %p len %x\n",
xfer->tx_buf, xfer->rx_buf, xfer->len);
}
}
if ((rc = spi_async(dw->spi, &tx->data)) != 0)
goto err_spi;
One of the tx_buf or rx_buf addresses should match the "buf" value shown in your existing debug code.
Michael
When called from dw1000_tx() : [ 71.957979] tx_buf ecf2a638 rx_buf (null) len 1 [ 71.957988] tx_buf ecc88c27 rx_buf (null) len f [ 71.957996] tx_buf ecf2a6b4 rx_buf (null) len 1 [ 71.958004] tx_buf bf00424b rx_buf (null) len 1 [ 71.958012] tx_buf ecf2a730 rx_buf (null) len 1 [ 71.958021] tx_buf ecf2a5c4 rx_buf (null) len 1 [ 71.958029] tx_buf ecf2a7ac rx_buf (null) len 1 [ 71.958037] tx_buf bf00424c rx_buf (null) len 1 [ 71.958044] tx_buf ecf2a828 rx_buf (null) len 1
When called from __spi_map_msg() (which is the function calling spi_map_buf() ) : [ 71.958157] tx_buf ecf2a638 rx_buf ecae95c0 len 1 [ 71.958167] tx_buf ecc88c27 rx_buf ecae95c0 len f [ 71.958175] tx_buf ecf2a6b4 rx_buf ecae95c0 len 1 [ 71.958183] tx_buf bf00424b rx_buf ecae95c0 len 1 [ 71.958192] tx_buf ecf2a730 rx_buf ecae95c0 len 1 [ 71.958199] tx_buf ecf2a5c4 rx_buf ecae95c0 len 1 [ 71.958208] tx_buf ecf2a7ac rx_buf ecae95c0 len 1 [ 71.958215] tx_buf bf00424c rx_buf ecae95c0 len 1 [ 71.958224] tx_buf ecf2a828 rx_buf ecae95c0 len 1 [ 71.958232] tx_buf ecae91c0 rx_buf ecf2a5c5 len 1
So ecf2a5c5 is probably the address of the null pointer
The when printing in spi_map_buf() to see which message fails : tx_buf ecf2a638 rx_buf ecae95c0 len 1 : Tx works Rx works (virt_addr_valid(buf)=1) tx_buf ecc88c27 rx_buf ecae95c0 len f : Tx works Rx works (virt_addr_valid(buf)=1) tx_buf ecf2a6b4 rx_buf ecae95c0 len 1 : Tx works Rx works (virt_addr_valid(buf)=1) tx_buf bf00424b rx_buf ecae95c0 len 1 : Tx FAILS (virt_addr_valid(buf)=0)
So it fails because 4th message address is not a valid virtual address
Could you try the patch at #15 to see if this fixes the problem?
Thanks,
Michael
Patch #15 fixes the issue, thank you ! Frames are sent/received by the kernel without problems.
I will set the same configuration on an other device and verify they are effectively sent/received and then close the issue.
Thank you. Could you also check that the "sensors" command is able to read a power supply voltage and operating temperature from the dw1000 chip? (The same bug was present on the hwmon code path.)
Sure ! If I don't make mistakes, iwpan only have "list" avaible and not "ops" option. Do you have some code to test that ? (else I'm just going to fill dw1000_hwmon_read() function with the right arguments)
Just install the "lm_sensors" package and run "/usr/bin/sensors": that should be all you need to test it.
[ 414.665010] dw1000 spi0.0: temperature 0x8d (0x7d @ 23degC) is 37035mdegC [ 437.639843] dw1000 spi0.0: voltage 0xae (0xaa @ 3.3V) is 3323mV Seems to work fine.
However I was not able to wind all the parameters required to establish the connection with an other device (I have direct SPI access to the other) :
- Channel number : 2 -> DW1000_CHANNEL_DEFAULT
- Pulse repetition frequency : 2 -> DW1000_PRF_64M
- Preamble length : ??
- Preamble acquisition chunk size : ??
- TX preamble code : 0 -> DW1000_TXPSR_DEFAULT
- RX preamble code : ?? (I suppose 0)
- standard/non-standart SFD : ?? (I suppose standart)
- Data rate : 2 -> DW1000_RATE_6800K
- PHY header mode : ??
- SFD timeout : ??