xen-orchestra
xen-orchestra copied to clipboard
Avoid "INTERNAL_ERROR" for live migration when default SR unreachable from host
I got an INTERNAL_ERROR
when trying to live migrate a VM from one host to the other, within the same pool.
I'm not 100% sure whose responsibility it is, XO or XAPI, but you will probably be able to tell.
The error happens after I validate this form without selecting a main SR (btw why is it called "main SR" rather than "destination SR"? - real question):
In XAPI logs, the error is:
Aug 6 13:06:56 r620-q6 xapi: [ warn||72154 HTTPS 172.16.210.221->|Async.VM.pool_migrate R:5ac50c773ad0|xapi_vm_helpers] Host r620-q7 cannot see SR 6440c3cb-b41e-f328-e349-0bec9c9c1330 (Local storage)
Aug 6 13:06:56 r620-q6 xapi: [error||72154 ||backtrace] Async.VM.pool_migrate R:5ac50c773ad0 failed with exception Server_error(VM_REQUIRES_SR, [ OpaqueRef:0ab045a0-0cba-46c8-b7aa-1f223385e04e; OpaqueRef:e595af5a-9c8c-43b8-99a0-6dea8a0dc180 ])
A bit of context:
- I'm attempting to migrate from host r620-q6 to host r620-q7
- The VM is running and currently resides on the local SR on r620-q6 (6440c3cb-b41e-f328-e349-0bec9c9c1330, the one that is mentioned in the error message as being unreachable from r620-q7 - which is true since it's local)
- This local SR the VM is on is the pool's default SR (I probably never changed it since I installed the pool in the first place).
Here are the pool SRs:
I suspect XO doesn't specify any target SR and the default SR gets selected, or XO picks the default SR and gives that as an argument to XAPI. And since the default SR is where the VM's VDI already is, there's probably no way a migrate can work with that SR as target.
If I select a destination SR, then everything works as expected.
Since this INTERNAL_ERROR
is meaningless for users, I'd like us to improve the situation.
- We can try to improve the error message from XAPI, if you confirm my analysis.
- Improve the form: as a user, when I validate a form with the default values and XO doesn't tell me something is missing, I expect it to work. My suggested solution is to perform a check before validating the form. If not SR was selected and the default SR is not reachable from the destination host, display an error and tell the user something like this: "The default SR (UUID and name of SR) is not reachable from the selected destination host. Please select a target SR and consider using a shared SR as the default SR to prevent this error in the future." (the last part being displayed only if the reason the default SR is not reachable is because it's local. Maybe there can be other reasons, like a badly configured or badly working shared?)
- Help users select a better default SR: it might be useful to warn when the default SR is not a shared SR, in the health page, when the pool is larger than 1 host.
Could you add a comment that summarizes what the chosen fix/improvement was in the end? I think I know but it would be better if the issue contains this information.
This is a mistake, the issue shouldn't have been closed.
What we did so far: add a "local default SRs" table to Dashboard > Health: #6033
What we'll do: add more checks during the migration to make sure that the user understands that the VM will be migrated to an unreachable default SR. But this will probably have to wait for XO 6.