opentrons
opentrons copied to clipboard
fix(update-server): Store names safely and reject names that would be invalid
Overview
This PR fixes some bugs related to renaming robots.
Closes #10687. This was a bug with the error-handling of certain invalid names.
Closes #10197. This was a bug where the robot was improperly storing names with certain special characters.
My intent and expectation with this PR is: every string that you send via POST /server/name {"name": "..."} will either be handled correctly as the robot's new name, or rejected with an HTTP 400 without changing the name.
Changelog
- Properly store names that contain special characters. This involves fleshing out our code to overwrite the
/etc/machine-infofile, so that we escape and quote things according to spec. - For names that we cannot properly store, or that we think would be considered "invalid" by systemd or Avahi: reject them with an HTTP 400 error. This involves a tweak to the
POST /server/nameHTTP endpoint.
Review requests
HTTP API changes
Do we agree with the changes to the POST /server/name HTTP endpoint? Arguably, they change the HTTP API, but I think in a way that's acceptable and non-breaking.
Error handling
I'm a little uncertain about this PR's approach to error handling.
I went with a "look before you leap" approach instead of "asking for forgiveness instead of permission." Before accepting a name, we check whether it would be valid to pass along to systemd and Avahi.
I avoided "asking for forgiveness instead of permission" because handling errors correctly seemed messy. If a name were accepted by Avahi but rejected by systemd (or vice-versa), we'd have to implement some kind of transactional rollback logic to restore the old name to both, to avoid a torn state.
But "looking before you leap" also has a downside: we have to reimplement systemd's and Avahi's validation rules, which are not always well-documented, and which can theoretically change over time.
Given this tradeoff, do we think the current "look before you leap" approach is good enough?
Risk assessment
Medium overall.
The riskiest changes are to how we're overwriting /etc/machine-info, which is always going to be dodgy. I've mitigated this risk with unit tests and a manual test plan, below.
Other changes in this PR should be lower-risk:
- The Opentrons App currently only allows alphanumeric characters. So, most of these changes cannot be hit in practice unless someone manually sends HTTP requests to a robot.
- We're strictly adding validation. So even if there's something wrong with that validation, it probably won't be any worse than what we had before, which was nothing.
- If a user manually overwrote
/etc/machine-infowith a weird name, this added validation can theoretically cause a new error to be raised uponupdate-serverstartup. This could soft-brick the robot. I don't think this is likely, but it's something to note.
- If a user manually overwrote
Manual testing
Setup
You'll need a real OT-2 to test this.
- Do a top-level
make pushto push everything to the OT-2.edgehas recent naming-related fixes to other subprojects, so you need to push everything, not justupdate-server. - SSH into the robot and tail the logs with
journalctl --follow --no-pagerorjournalctl --follow --no-pager --unit opentrons-update-server.
Tests
- Make sure setting the robot's name through the Opentrons App still works.
- Try manually changing the robot's name with HTTP
POST /server/name {"name": "new name"}.- Try using crazy strings. Empty, very long, special characters, Unicode, whitespace at the beginning and end, etc.
- The endpoint should either return 200 and set the name to what you specified, or return 400 and leave the name unchanged.
- The endpoint should never return 500.
- Exception: While a rename is in progress, HTTP
GETs to retrieve the robot's name may fail with a 500 error related tohostnamectlfailing becausesystemd-hostnamedexited early. This is a known problem that's unrelated to these changes, and we think it's harmless. See #11175.
- Make sure that the name is kept in sync between all the places it shows up:
- HTTP:
GET /server/name,GET /server/update/health, andGET /health. - mDNS+DNS-SD: Try running
dns-sd -B. - You can also try
cd discovery-client && yarn run discovery, which will synthesize all of these.
- HTTP:
Codecov Report
Merging #11177 (2ec5afb) into edge (0115066) will increase coverage by
0.01%. The diff coverage is82.14%.
@@ Coverage Diff @@
## edge #11177 +/- ##
==========================================
+ Coverage 73.78% 73.79% +0.01%
==========================================
Files 2089 2089
Lines 57737 57760 +23
Branches 5859 5859
==========================================
+ Hits 42599 42623 +24
+ Misses 13859 13858 -1
Partials 1279 1279
| Flag | Coverage Δ | |
|---|---|---|
| update-server | 73.71% <82.14%> (+0.59%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...otupdate/common/name_management/pretty_hostname.py | 47.91% <63.15%> (+13.77%) |
:arrow_up: |
| ...te-server/otupdate/common/name_management/avahi.py | 52.38% <83.33%> (+2.89%) |
:arrow_up: |
| ...update/common/name_management/name_synchronizer.py | 97.77% <93.75%> (+0.27%) |
:arrow_up: |
| ...server/otupdate/common/name_management/__init__.py | 100.00% <100.00%> (ø) |
This is low-priority and I haven't had a chance to work on it, so I'm closing it to get it out of the way. Other than some merge conflicts and the functionality thing that I mentioned in my last comment, this is still in good shape to pick up later.