incus icon indicating copy to clipboard operation
incus copied to clipboard

Consistent and fine-grained status/error codes for operations

Open gwenya opened this issue 8 months ago • 2 comments

Is there an existing issue for this?

  • [x] There is no existing issue for this feature

What are you currently unable to do

It is difficult to detect what kind of error occurred when using the API, especially with operations.

For one, waiting on an operation seems to return different information when the operation fails than retrieving the information after it has failed. Since incus --debug query doesn't give me the full api responses, I took them from the incus logs:

# incus query /1.0/operations/<id>/wait
... level=debug msg="Error Response\n\t{\n\t\t\"type\": \"error\",\n\t\t\"status\": \"\",\n\t\t\"status_code\": 0,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 500,\n\t\t\"error\": \"Failed creating instance record: Add instance info to the database: Failed to create \\\"instances\\\" entry: UNIQUE constraint failed: instances.project_id, instances.name\",\n\t\t\"metadata\": null\n\t}" http_code=500

# incus query  /1.0/operations/<id>
... level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {\n\t\t\t\"failure\": [\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"0e643d62-b9da-4c78-8208-07d4e7089940\",\n\t\t\t\t\t\"class\": \"task\",\n\t\t\t\t\t\"description\": \"Creating instance\",\n\t\t\t\t\t\"created_at\": \"2025-04-05T00:06:56.221330995+02:00\",\n\t\t\t\t\t\"updated_at\": \"2025-04-05T00:07:01.041872217+02:00\",\n\t\t\t\t\t\"status\": \"Failure\",\n\t\t\t\t\t\"status_code\": 400,\n\t\t\t\t\t\"resources\": {\n\t\t\t\t\t\t\"instances\": [\n\t\t\t\t\t\t\t\"/1.0/instances/test\"\n\t\t\t\t\t\t]\n\t\t\t\t\t},\n\t\t\t\t\t\"metadata\": {\n\t\t\t\t\t\t\"download_progress\": \"rootfs: 100% (32.66MB/s)\"\n\t\t\t\t\t},\n\t\t\t\t\t\"may_cancel\": false,\n\t\t\t\t\t\"err\": \"Failed creating instance record: Add instance info to the database: Failed to create \\\"instances\\\" entry: UNIQUE constraint failed: instances.project_id, instances.name\",\n\t\t\t\t\t\"location\": \"none\"\n\t\t\t\t}\n\t\t\t]\n\t\t}\n\t}" http_code=200

Also, error codes are barely documented, and don't seem to be very consistent. For example creating a network with a name that is already taken fails with error code 409, but creating an instance with a name that is already taken fails with error code 500 and includes the unique constraint violation from the database in the message.

What do you think would need to be added

I think errors should be returned consistently both when getting an operation and when waiting on it.

Additionally I would like to start an effort to collect the various errors that the API currently returns, assign consistent and fine-grained error codes to them (e.g. always 409 when something can't be created because the name is already taken, rather than just 400 or 500), then document those and refactor the code base to use them. Ideally it would be possible to use the API and detect what error happened without having to look for particular strings in the error messages.

gwenya avatar Apr 04 '25 22:04 gwenya

The difference in output for /wait was a pretty simple bug in that logic which was likely never noticed as we don't really use /wait internally.

The more consistent error codes and better error handling, especially for conflicts, that'd be really nice for someone to try to tackle :)

The error one currently gets when an instance already exists is rather horrendous. It'd be good to be able to standard on something like "$Object NAME already exists in project PROJECT", so "Instance foo already exists in project bar" in this case, and be able to detect such "clean" errors so we can have the API handler skip all the error wrapping and just return that string directly to the user.

stgraber avatar Apr 05 '25 05:04 stgraber

Thanks for the fix!

I'm trying to get the already exists error for instances to be a 409 by using db.ErrAlreadyDefined but for some reason it gets mapped into a 400 instead of 409. Once I get that to work I'm thinking of making a api.AlreadyExistsError type that can encapsulate the type of object, the name and the project so that we can turn it into a nice error message in SmartError.

gwenya avatar Apr 06 '25 09:04 gwenya