SmartThingsEdgeDrivers icon indicating copy to clipboard operation
SmartThingsEdgeDrivers copied to clipboard

Lunchbox (Aqara FP2 and Philips Hue): handle wantread from luasocket/luasec

Open ldeora opened this issue 1 year ago • 11 comments

The wantread message indicates that the socket is not ready to read data yet and the operation should be retried later. This happens when you're trying to read data from a socket, but the data isn't available at that moment.

This is a proposal for a fix for the Aqara FP2 issue #1648 .

Add a new function in rest.lua to check if the socket is ready for reading

local function wait_for_read(socket, timeout)
  -- Use socket.select to check if the socket is ready for reading
  local ready_sockets, _, _ = socket.select({socket}, nil, timeout)
  return #ready_sockets > 0  -- Returns true if the socket is ready to read
end

And use it here:

https://github.com/SmartThingsCommunity/SmartThingsEdgeDrivers/blob/7821a5d168d89fb449391a584a3566c605389265/drivers/SmartThings/philips-hue/src/lunchbox/rest.lua#L106-L108 https://github.com/SmartThingsCommunity/SmartThingsEdgeDrivers/blob/7821a5d168d89fb449391a584a3566c605389265/drivers/Aqara/aqara-presence-sensor/src/lunchbox/rest.lua#L104-L106

    if wait_for_read(sock, 1) then
      next_recv, next_err, partial = sock:receive(pat)
    else
      -- No data is ready, sleep for a bit
      socket.sleep(0.1)
    end

Additionally handle wantread here by waiting and trying again:

https://github.com/SmartThingsCommunity/SmartThingsEdgeDrivers/blob/7821a5d168d89fb449391a584a3566c605389265/drivers/SmartThings/philips-hue/src/lunchbox/rest.lua#L109-L120 https://github.com/SmartThingsCommunity/SmartThingsEdgeDrivers/blob/7821a5d168d89fb449391a584a3566c605389265/drivers/Aqara/aqara-presence-sensor/src/lunchbox/rest.lua#L107-L118

if next_err ~= nil then
  if string.lower(next_err) == "closed" then
    if partial ~= nil and #partial >= 1 then
      full_response:append_body(partial)
      next_chunk_bytes = 0
    else
      return nil, next_err
    end
  elseif string.lower(next_err) == "wantread" then
    -- Wait and try again if socket is not ready yet
    socket.sleep(0.1)
  else
    return nil, ("unexpected error reading chunked transfer: " .. next_err)
  end
end

Sorry for the formatting...

ldeora avatar Sep 30 '24 00:09 ldeora

Add a new function in rest.lua to check if the socket is ready for reading

This should already be happening in cosock/socket/internals.lua. An additional check here is probably not going to change much. wantread should be triggering the coroutine.yield here.

I believe the actual issue here is that this timeout is being reached and not being retried which is the only time we should see the allow-listed methods would return a timeout/want* error. The call to select will only ensure that something is ready to read on the socket and not that all of the data is available so it may end up just delaying the error report.

I think I see the bug in fill_body so long as this response is chunk encoded. It seems we drop the partial response from HttpMessage.fill_chunked_body

essentially this return is being dropped when called here but should probably be appended to self.body before returning nil, err to allow for retrying this operation

FreeMasen avatar Sep 30 '24 16:09 FreeMasen

Do you have an invitation link to the driver channel?

(Not at home right now and setting up my own channel would be a hassle.)

ldeora avatar Sep 30 '24 18:09 ldeora

Hey @ldeora Thanks for your engagement, I was able to get a driver sharing link for the PR that @FreeMasen mentioned. Please check out this comment https://github.com/SmartThingsCommunity/SmartThingsEdgeDrivers/pull/1656#issuecomment-2384601737

The driver is available on this channel, https://bestow-regional.api.smartthings.com/invite/eGM6B8mq4eMA

lelandblue avatar Oct 01 '24 01:10 lelandblue

Good news: error is gone. Bad news: device still not added.

Am I the only one with this device who is testing this driver? From what I've seen, those who tested the driver, did it before the last commit that updated the lunchbox.

Seojune no longer responds to my email inquiries, although he has done so previously. I mean I can live without this driver and if it's broken, it's Aqara's problem...

2024-10-01T07:53:16.628461921Z TRACE Aqara Presence Sensor  Setup driver aqara-fp2 with lifecycle handlers:
DeviceLifecycleDispatcher: aqara-fp2
  default_handlers:
    removed:
    init:
    driverSwitched:
    infoChanged:
    added:
  child_dispatchers:

2024-10-01T07:53:16.634018296Z TRACE Aqara Presence Sensor  Setup driver aqara-fp2 with Capability handlers:
CapabilityCommandDispatcher: aqara-fp2
  default_handlers:
    refresh:
      refresh
    multipleZonePresence:
      createZone
      updateZoneName
      deleteZone
  child_dispatchers:

2024-10-01T07:53:16.673406671Z INFO Aqara Presence Sensor  Created dispatcher [SecretDataDispatcher]aqara-fp2 that had no handlers
2024-10-01T07:53:16.680166171Z TRACE Aqara Presence Sensor  Setup driver aqara-fp2 with Secret Data handlers:
SecretDataDispatcher: aqara-fp2
  default_handlers:
  child_dispatchers:

2024-10-01T07:53:16.688160505Z TRACE Aqara Presence Sensor  Received event with handler driver_lifecycle
2024-10-01T07:53:16.693856213Z INFO Aqara Presence Sensor  received driver startupState: {"hub_zigbee_id":"KG2XAAILh00=","hub_node_id":1,"hub_ipv4":"192.168.1.116"}
2024-10-01T07:53:16.754670546Z TRACE Aqara Presence Sensor  Received event with handler discovery
2024-10-01T07:53:16.765714171Z INFO Aqara Presence Sensor  discovery_mdns.find_device_ips
2024-10-01T07:53:17.474704588Z INFO Aqara Presence Sensor  found_name = _Aqara-FP2._tcp
2024-10-01T07:53:17.482858755Z INFO Aqara Presence Sensor  ip = 192.168.1.121
2024-10-01T07:53:17.494244838Z INFO Aqara Presence Sensor  found_name = _Aqara-FP2._tcp
2024-10-01T07:53:17.501554505Z INFO Aqara Presence Sensor  found_name = _Aqara-FP2._tcp
2024-10-01T07:53:17.506068671Z INFO Aqara Presence Sensor  ip = 192.168.1.121
2024-10-01T07:53:17.510552588Z TRACE Aqara Presence Sensor  unknown dni= 54:EF:44:5C:4C:98, ip= 192.168.1.121
2024-10-01T07:53:17.515177046Z TRACE Aqara Presence Sensor  try_add_device : dni= 54:EF:44:5C:4C:98, ip= 192.168.1.121
2024-10-01T07:53:17.519963588Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Creating TCP socket for REST Connection
2024-10-01T07:53:17.524763088Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Setting TCP socket timeout for REST Connection
2024-10-01T07:53:17.529265213Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Connecting TCP socket for REST Connection
2024-10-01T07:53:17.576603380Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Set Keepalive for TCP socket for REST Connection
2024-10-01T07:53:17.748174755Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Creating SSL wrapper for for REST Connection
2024-10-01T07:53:17.765533505Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Performing SSL handshake for for REST Connection
2024-10-01T07:53:20.441035088Z INFO Aqara Presence Sensor  54:EF:44:5C:4C:98 Successfully created TCP connection
2024-10-01T07:56:16.739520401Z TRACE Aqara Presence Sensor  Received event with handler discovery
2024-10-01T07:56:16.761897984Z TRACE Aqara Presence Sensor  Received event with handler driver_lifecycle

Screenshot_20241001_094534_Samsung capture

Screenshot_20241001_095623_SmartThings

ldeora avatar Oct 01 '24 08:10 ldeora

I even tried to trick someone else to delete and add the device but he didn't fell for it...

https://community.smartthings.com/t/aqara-fp2-tested-with-smartthings/270688/79

He's using the driver channel from the PR and who knows when he added the device the last time. Going to ask him later. Edit: 23/09/2024... That doesn't make it easier. Maybe there's an expiration date in my firmware version.

If anyone can confirm that adding the device works, that would be great. My device has been successfully added a long time before the lunchbox was updated. After this 14e259b70acca0b15a8e09fe05eae441c8c9b16c commit, I was able to add it several times without problems. After the whole PR was merged into main I tried it again but now it fails. The only change was this: 640f0667a413440b9674c6adc6228d57c2f236fc .

But I repeat myself.

ldeora avatar Oct 01 '24 09:10 ldeora

@FreeMasen

Now we need a log.info() in here to check if the the request works. Would have been nice to have it in the original PR.

https://github.com/FreeMasen/SmartThingsEdgeDriversFork/blob/d72deca83b1be2926ec3079aa843d4a2b11a8aec/drivers/Aqara/aqara-presence-sensor/src/fp2/api.lua#L73-L76

Remember, this is where it failed:

2024-09-27T22:47:45.440063966Z ERROR Aqara Presence Sensor  get_credential : ip = 192.168.1.121, failed to get token, error = wantread
2024-09-27T22:47:45.445295882Z WARN Aqara Presence Sensor  credential is nil
2024-09-27T22:47:45.450356841Z ERROR Aqara Presence Sensor  failed to get credential. dni= 54:EF:44:5C:4C:98, ip= 192.168.1.121

ldeora avatar Oct 01 '24 10:10 ldeora

@seojune79 Is there a reason why the FP2 can't be added anymore? Maybe a hidden expiration date for /authcode or the whole REST in firmware version 1.2.7_0011.0080?

ldeora avatar Oct 01 '24 18:10 ldeora

If something goes wrong while removing FP2 from the ST app, the device won't be onboarded. If this happens, removing and onboarding the FP2 in the Aqara home app and then trying to onboard in the ST app should resolve the issue.

ldeora avatar Oct 01 '24 23:10 ldeora

Not fixed. Another user has the same issue. Read the comments here (conversation between Andreas_Roedl and ewimer2):

https://community.smartthings.com/t/aqara-fp2-tested-with-smartthings/270688/136

User has hub firmware 54.x, FP2 firmware 1.2.8x.

Keep in mind that the rollout of the FP2 firmware isn't completed yet and these are the first users who are trying to connect the FP2 to ST.

ldeora avatar Oct 17 '24 11:10 ldeora

@FreeMasen User tried your modified driver but it didn't solve the issue.

ldeora avatar Oct 17 '24 11:10 ldeora