opentrons icon indicating copy to clipboard operation
opentrons copied to clipboard

fix(update-server): Store names safely and reject names that would be invalid

Open SyntaxColoring opened this issue 3 years ago • 1 comments

Overview

This PR fixes some bugs related to renaming robots.

Closes #10687. This was a bug with the error-handling of certain invalid names.

Closes #10197. This was a bug where the robot was improperly storing names with certain special characters.

My intent and expectation with this PR is: every string that you send via POST /server/name {"name": "..."} will either be handled correctly as the robot's new name, or rejected with an HTTP 400 without changing the name.

Changelog

  • Properly store names that contain special characters. This involves fleshing out our code to overwrite the /etc/machine-info file, so that we escape and quote things according to spec.
  • For names that we cannot properly store, or that we think would be considered "invalid" by systemd or Avahi: reject them with an HTTP 400 error. This involves a tweak to the POST /server/name HTTP endpoint.

Review requests

HTTP API changes

Do we agree with the changes to the POST /server/name HTTP endpoint? Arguably, they change the HTTP API, but I think in a way that's acceptable and non-breaking.

Error handling

I'm a little uncertain about this PR's approach to error handling.

I went with a "look before you leap" approach instead of "asking for forgiveness instead of permission." Before accepting a name, we check whether it would be valid to pass along to systemd and Avahi.

I avoided "asking for forgiveness instead of permission" because handling errors correctly seemed messy. If a name were accepted by Avahi but rejected by systemd (or vice-versa), we'd have to implement some kind of transactional rollback logic to restore the old name to both, to avoid a torn state.

But "looking before you leap" also has a downside: we have to reimplement systemd's and Avahi's validation rules, which are not always well-documented, and which can theoretically change over time.

Given this tradeoff, do we think the current "look before you leap" approach is good enough?

Risk assessment

Medium overall.

The riskiest changes are to how we're overwriting /etc/machine-info, which is always going to be dodgy. I've mitigated this risk with unit tests and a manual test plan, below.

Other changes in this PR should be lower-risk:

  • The Opentrons App currently only allows alphanumeric characters. So, most of these changes cannot be hit in practice unless someone manually sends HTTP requests to a robot.
  • We're strictly adding validation. So even if there's something wrong with that validation, it probably won't be any worse than what we had before, which was nothing.
    • If a user manually overwrote /etc/machine-info with a weird name, this added validation can theoretically cause a new error to be raised upon update-server startup. This could soft-brick the robot. I don't think this is likely, but it's something to note.

Manual testing

Setup

You'll need a real OT-2 to test this.

  1. Do a top-level make push to push everything to the OT-2. edge has recent naming-related fixes to other subprojects, so you need to push everything, not just update-server.
  2. SSH into the robot and tail the logs with journalctl --follow --no-pager or journalctl --follow --no-pager --unit opentrons-update-server.

Tests

  • Make sure setting the robot's name through the Opentrons App still works.
  • Try manually changing the robot's name with HTTP POST /server/name {"name": "new name"}.
    • Try using crazy strings. Empty, very long, special characters, Unicode, whitespace at the beginning and end, etc.
    • The endpoint should either return 200 and set the name to what you specified, or return 400 and leave the name unchanged.
    • The endpoint should never return 500.
    • Exception: While a rename is in progress, HTTP GETs to retrieve the robot's name may fail with a 500 error related to hostnamectl failing because systemd-hostnamed exited early. This is a known problem that's unrelated to these changes, and we think it's harmless. See #11175.
  • Make sure that the name is kept in sync between all the places it shows up:
    • HTTP: GET /server/name, GET /server/update/health, and GET /health.
    • mDNS+DNS-SD: Try running dns-sd -B.
    • You can also try cd discovery-client && yarn run discovery, which will synthesize all of these.

SyntaxColoring avatar Jul 19 '22 22:07 SyntaxColoring

Codecov Report

Merging #11177 (2ec5afb) into edge (0115066) will increase coverage by 0.01%. The diff coverage is 82.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##             edge   #11177      +/-   ##
==========================================
+ Coverage   73.78%   73.79%   +0.01%     
==========================================
  Files        2089     2089              
  Lines       57737    57760      +23     
  Branches     5859     5859              
==========================================
+ Hits        42599    42623      +24     
+ Misses      13859    13858       -1     
  Partials     1279     1279              
Flag Coverage Δ
update-server 73.71% <82.14%> (+0.59%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...otupdate/common/name_management/pretty_hostname.py 47.91% <63.15%> (+13.77%) :arrow_up:
...te-server/otupdate/common/name_management/avahi.py 52.38% <83.33%> (+2.89%) :arrow_up:
...update/common/name_management/name_synchronizer.py 97.77% <93.75%> (+0.27%) :arrow_up:
...server/otupdate/common/name_management/__init__.py 100.00% <100.00%> (ø)

codecov[bot] avatar Jul 19 '22 22:07 codecov[bot]

This is low-priority and I haven't had a chance to work on it, so I'm closing it to get it out of the way. Other than some merge conflicts and the functionality thing that I mentioned in my last comment, this is still in good shape to pick up later.

SyntaxColoring avatar Aug 25 '22 14:08 SyntaxColoring