stash
stash copied to clipboard
[Bug Report] Stash crashes if sent graphQL after upgrade + pre migration
Summary
Stash crashes if sent a graphQL command before the DB is upgraded to the latest schema (e.g. after updating to a new version)
To Reproduce
- Install stash v0.22.0
- Upgrade to stash v0.23.0
- Start stash (without doing anything else)
- Send a graphQL command to stash (e.g. to run a scan)
- stash sends back an OK response, then crashes
Context / Troubleshooting performed
I have a v. simple script that auto-runs a scan for newly downloaded content when stash starts by:
- Running
stash-win.exe
with several parameters, in a minimised window - Waiting 15 seconds
- Sending a graphQL instruction via curl for stash to run a scan
After upgrading to stash v0.23.0, I found stash was crashing a few seconds after starting. I did some local troubleshooting, and discovered that the upgrade to stash v0.23.0 also required a DB schema (which is fine!) however, presumably because the stash console window was being minimised(?), this was not being prompted and so was not being done before the time-delayed graphQL instruction was sent a few seconds later, causing stash to crash (rather than ignore the command or send an error response).
Expected behaviour
If a DB upgrade is required and/or there are certain requests that stash cannot execute at a certain time due (e.g. because a DB upgrade is required first), when receiving an instruction stash should either:
- Ignore any graphQL requests
- Return an error, ideally advising that a DB upgrade is required because of a new schema
Stash Version
v0.22.0 --> v0.23.0
Desktop information
- OS: Windows 11
- Browser: Edge
- Browser version: Version 114.0.1823.51
I have a fix for this locally, as part of a larger refactor. It's based off of #4152, so I'll send a PR through once that merges.
It simply returns a "database not initialized" error if any request involving the database is made before it is properly initialized, i.e. before the initial setup or if a migration is required.
If a migration is required, all pages in the UI will redirect to the migration page to prompt for user confirmation. There is also a warning logged on startup (database schema version * does not match required schema version *
). If you have stash configured to not automatically open the UI in a browser on startup, then given it is being launched minimized, I can see how both of these would be missed.
What you could do is use the systemStatus
GraphQL query to retrieve the current status:
query {
systemStatus {
databaseSchema
appSchema
status
}
}
and only start the automatic scan if status
is not OK
(ie it is NEEDS_MIGRATION
). Alternatively you could just make sure to start stash normally (without the script) for the first time after updating, to ensure that any migrations are done before attempting to run any external database-accessing GraphQL queries.
Thanks for the reply and info @DingDongSoLong4!
Responding to your points
If you have stash configured to not automatically open the UI in a browser on startup, then given it is being launched minimized, I can see how both of these would be missed.
Yeah, this is the exact situation that applied unfortunately!
What you could do is use the
systemStatus
GraphQL query to retrieve the current status and only start the automatic scan ifstatus
is notOK
(ie it is notNEEDS_MIGRATION
).
Ah, awesome, thanks for sharing that script! I am very new to GraphQL, so appreciate it! Ideally I just wanted to keep my stash script as simple as possible but doing this check first isn't too much extra to add (plus lets me catch any errors and throw up a dialog box, etc)
Alternatively you could just make sure to start stash normally (without the script) for the first time after updating
Yeah, it was a learning experience for sure and now that I am more explicitly aware of the dependency between stash-win.exe
upgrades <--> stash schema
upgrades <--> graphQL limitations, I will definitely try and make a mental note about this in the future!
Additional reflection on overall situation
Firstly, thanks again for your help and multiple suggestions! Overall these should significantly reduce the chance of this happening again!
As you said above, the combination of having open browser = false
and running stash-win.exe
minimised didn't help my situation at all. That said, I think my debugging was somewhat hindered by stash returning a normal response to the scan request, i.e. the output from my stash starter script was...
> run-stash
Starting stash-win.exe...
Requesting scan...
{"data":{"metadataScan":"1"}}
... which I interpretated as ...
- " OK so there is nothing wrong with stash / the GraphQL command...
- Maybe somehow the
GOTO:EOF
at the end of my script is causing it to kill stash, which is weird as I explicitly useGOTO:EOF
rather thanEXIT
as it's safer... - And I run stash via a
START
command (eg.START "Stash" /MIN "%stashdir%\stash-win.exe" --config...
) so it runs asynchronously... - But I don't see what else it could be!? " 😕
I then spent ages looking at the wrong problem, swapping GOTO:EOF
for EXIT /B 0
, etc and changing various switches like /I
in the START
command, etc... Like you say, it was only when running into dead ends and thinking "I'm just going to literally double click stash-win.exe
" and seeing the DB upgrade message that it finally clicked what the actual issue was.
I'm not saying that stash needs to consider everything and protect against every single scenario, but in my defence, stash returning {"data":{"metadataScan":"1"}}
rather than say {"error":{"message":"DB upgrade required"}}
did throw me off the scent completely TBH...
- If this is the kind of change that is part of the PR you mentioned then that's awesome!
- If not, maybe I can raise a feature request for this? (i.e. request that if the DB is not initialised, stash should reply to various GraphQL commands (scan, generate, etc) with an error instead of a normal response)
I'm not saying that stash needs to consider everything and protect against every single scenario, but in my defence, stash returning
{"data":{"metadataScan":"1"}}
rather than say{"error":{"message":"DB upgrade required"}}
did throw me off the scent completely TBH...
This is perfectly reasonable, and I definitely agree. Having a database that needs to be migrated is an entirely normal scenario - it should be taken into account wherever possible, and should definitely not cause a crash.
Actually now that I think about it, my change wouldn't solve the issue with metadataScan
specifically. When you run the metadataScan
mutation, it queues a job in the background - the job itself is only started when it gets to the top of the job queue. This means that a job creation mutation (such as metadataScan
) will basically never return an error.
The way I'd probably fix it is to allow for jobs to verify some kind of precondition before they can be added to the queue, in this case checking if the database has been initialized. This will let the action of adding a job to the queue potentially fail, which would then be able to return an error directly from metadataScan
.
My fix was for normal data graphql requests (something like findScene
or sceneUpdate
), where if the database was uninitialized it would result in a nil pointer deference panic.
Also sidenote: if you haven't already, I would highly recommend configuring a log file (Settings > System > Logging). The log file would have allowed you to see the database schema version does not match
warning, despite the crash.
This is perfectly reasonable [...] Having a database that needs to be migrated is an entirely normal scenario [...] and should definitely not cause a crash
Cool, yeah so I think there are two potential code changes that could be implemented, if the dev team agrees...
Part 1 (Required?
) - Stash protected so that it does not crash when sent GraphQL command(s)
- As you say, having stash crash in this situation is not ideal
- Especially when you consider that while it was trivial for me to restart, if people are running stash remotely or in more complicated setups, this could be more of a pain
- Crucially though - if stash had not crashed, I would not have had my issue as...
- If stash didn't crash - I would've returned to the browser, refreshed the page, and then instead of a getting the "server offline"
localhost refused to connect
page, I would've been taken to the DB upgrade page, and so would've got the prompt to update the DB right away.
Part 2 (Optional?
) - Above, plus adding pre-checking steps to stash job manager so it verifies the job is possible before returning response
The way I'd probably fix it is to allow for jobs to verify some kind of precondition before they can be added to the queue, in this case checking if the database has been initialized. This will let the action of adding a job to the queue potentially fail, which would then be able to return an error directly from metadataScan.
- As you described above, adding some checking on the current state before a job is accepted to the queue and then to return an error if the queue is unavailable, or the pre-conditions aren't met, etc.
Next steps?
- Thanks once again for your input and assistance on this!
- The above sounds sensible and reasonable to me, however I realise I don't have an idea of the amount of work involved to implement either of the above
- In terms of getting someone to look at this more formally and decide on whether to accept it into the pipeline... is this something that you can do?... or is there someone that I can tag to have a look at the above?
Well Part 1 is part of the fix I have already made locally - it will cause the scan to log a "database not initialized" error message instead of causing a crash.
Part 2 is what I haven't implemented yet - I'll see what I can do when I have time. I don't imagine it would be amazingly complex to do, although it probably won't be trivial either.
In terms of getting someone to look at this more formally and decide on whether to accept it into the pipeline... is this something that you can do?... or is there someone that I can tag to have a look at the above?
In general @WithoutPants is the one who handles most of these things, although I'd say both changes are relatively uncontroversial.
I'd classify Stash crashing on a metadataScan
when the database needs migration as a definite bug. However I would probably classify wanting metadataScan
to return an error instead of returning successfully and having the job fail as a feature request - the current behaviour (minus the crashing), where metadataScan
never errors directly, is intentional given then current job system.
Part 1 is part of the fix I have already made locally - it will cause the scan to log a "database not initialized" error message instead of causing a crash.
Ah, amazing!
Part 2 is what I haven't implemented yet - I'll see what I can do when I have time. I don't imagine it would be amazingly complex to do, although it probably won't be trivial either.
Appreciate that. Like I said above, whilst I still think part 2 is valid/beneficial, it is more of a "optional" or "nice to have" so please just work on it whenever you have time, no rush!
However I would probably classify wanting
metadataScan
to return an error instead of returning successfully and having the job fail as a feature request - the current behaviour, wheremetadataScan
never errors directly, is intentional given then current job system.
- Yeah, that makes sense. In my case it would be helpful to have it work in a pseudo 'synchronous' mode, as my script doesn't "hang around" after the request is made or check the job status later on, etc - so at "request time" would definitely be the best place for scripts like mine to handle any errors.
- That said, I appreciate that if that's not the current system design then that is a bigger change, and it may be handled better already by users with more complex scripts that do periodic checks on the job queue, etc.
- Happy to go whichever way you and other devs want to on this one.
Just to double-check -- Although in this instance it was a metadataScan
request that caused the crash. I'm assuming that the code is being updated to handle "problematic jobs / requests" in general, right?... i.e. not just metadataScan
specifically, but also say if I sent a metadataGenerate
or metadataClean
message?
In general @WithoutPants is the one who handles most of these things, although I'd say both changes are relatively uncontroversial. I'd classify Stash crashing on a
metadataScan
when the database needs migration as a definite bug. However I would probably classify wantingmetadataScan
to return an error instead of returning successfully and having the job fail as a feature request - the current behaviour (minus the crashing), wheremetadataScan
never errors directly, is intentional given then current job system.
I think metadataScan
returning an error if the preconditions aren't met - without adding it to the queue - is pretty straightforward expected behaviour imo. The implementation just adds the job to the manager currently, but there's nothing to suggest that this is the way it should be. metadataScan
should be just as capable of erroring out as any other mutation.
Edit: in fact, metadataScan
does error out currently if ffmpeg or ffprobe are missing.
Alright that was easier than I thought it would be - I've added a commit to my local branch that ensures the database is initialized if necessary before adding jobs. metadataScan
(also metadataGenerate
, and a bunch of others) now error with database not initialized
if the database is not ready.
Awesome, thanks for this!
😁
NB: Github may block GIFs auto-playing if not enabled in your accessibility settings