yaws
yaws copied to clipboard
application:set_env/3 in yaws:get_app_dir/0 is bad
Just upgraded Yaws to 2.0.7 on a live node by changing the code path without bring down the node and then I am immediately seeing a lot of these:
Dynamic compile error:
/opt/myapp/lib/myapp-2.0.3/priv/www/layout.yaws:4: can't find include file "/opt/myapp/lib/yaws-2.0.6/include/yaws_api.hrl"
After some poking around I discovered it was the fault of yaws:get_app_dir/0
, specifically this line
application:set_env(yaws, app_dir, AppDir)
I grepped the yaws source and see yaws:get_app_dir/0
only called rarely. So the above line does little and can cause harm.
Another more threatening issue is delivering those errors to the page, which is a security nightmare. I think a much better and safer option is to let it crash in this case. Thoughts?
I'm still trying to figure out exactly how you're trying to upgrade, since what you're describing doesn't seem safe. There are modules loaded and running, and then you're just changing the load path? What's forcing anything to load from the new path? Clearly there is state still pointing to old paths, as indicated by the error message.
I use a combination of code:replace_path/2
, code:modified_modules/0
and code:load_file/1
. Is it unsafe? May be for example, when state record changes for gen processes. But it's been working 99% of the time so I am not too bothered. I intend to try release upgrade again when I have time.
Meanwhile yaws is caching a stale app_dir that doesn't get updated when I restart the Yaws application.
> application:get_env(yaws, app_dir).
{ok,"/opt/myapp/lib/yaws-2.0.6"}
Is it better to replace get_app_dir/0 with get_app_dir() -> code:lib_dir(yaws).
?
If you really restarted the application, then application
would no longer have app_dir
cached in the environment, and the next call to get_app_dir/0
would cache a new value.
I've never before heard of anyone using the various code
calls you mention to do a restart, so that's a concern. One general issue I can think of, not specific to Yaws, is that it doesn't allow for code change callbacks, supervised restarts, state changes, etc. To get what you're trying to do to work 100% of the time, we'd have to find all caches, state, etc. in Yaws and provide some custom way to flush them, allow for upgrades, etc., which I don't think makes much sense given OTP already provides what's needed for this. I'm not saying Yaws is 100% OTP-compliant in this regard though; I think the typical approach with Yaws is similar to how other web servers are updated: do a regular restart, or do a rolling restart over a load balanced cluster of replicas to avoid any downtime.
Otherwise, perhaps since you're already doing a custom upgrade, you just need to add a call to application:unset_env(yaws, app_dir)
after running your list of code
calls?
I think if you restart the Yaws application without bring down the VM, the envvars are all kept intact unless the application is unloaded with application:unload/1
. 100% agree OTP release upgrade is the right way to go though custom upgrade is probably more common than people think. For example, recently Richard Carlsson gave a talk on 10 years custom upgrade with minimum downtime.
I had a workaround in place but I didn't find application:unset_env/2
in a hurry. I'll update my workaround to use it. Thanks.
The reason I open this report is to provoke some rethink on this to see if it can be cleaned up or improved upon.
What do you think of the security issue I raised in the second comment. I think a lot of Yaws early decisions were prioritised for developers which have become vulnerabilities. At one point (2017) I had Yaws happily put my DB password on the page. The viewers are 99% of the time not the developers so putting errors on page seems to make little sense besides handing critical info to attackers.
Yes, I agree the security issue is a concern. I might be wrong but I'm guessing finding and plugging all occurrences of such behavior will not be a simple task. As usual, the best way to get an issue resolved is to submit a small test case or multiple test cases that show the problem(s), as it can often be difficult for us to reproduce problems from textual descriptions.
It is hard to find all of them in any software projects. The following two can be a good start and should help Yaws' security a bit. Put the following code in a .yaws file and view the page in a browser:
<erl>
%% case 1
out(_Arg) -> error("this-string-should-not-appear-on-page").
</erl>
<erl>
%% case 2
out(_Arg) -> ok;
</erl>
Case 1 crashes and case 2 deliberately introduces a syntax error. The error texts should never appear on page, at the least not when debug is off.
Personally I think case 2 should be treated at least as severe as the crash case, i.e. if a code block don't even compile, let the whole page crash. The current behaviour of partial success is painfully bad.
Looking at this issue again. Note that the "Customizations" section of the Yaws tex documentation explains:
When actually deploying an application at a live site, some of the standard YAWS behaviors are not acceptable. Many sites want to customize the web server behavior when a client requests a page that doesn’t exist on the web server. The standard YAWS behavior is to reply with status code 404 and a message explaining that the page doesn’t exist.
Similarly, when YAWS code crashes, the reason for the crash is displayed in the Web browser. This is very convenient while developing a site but not acceptable in production.
With Yaws server configuration, it's already possible to deal with some of the problems you mention by providing a custom error handling module, specifically for the errormod_crash
setting. But it doesn't solve everything, such as your second example, and also, development settings rather than production settings are on by default.
To address this issue, I plan to
- make the default
errormod
good for production instead of development - add a development default
errormod
that's easy to enable - extend the
errormod
concept to also deal with compilation errors of.yaws
files
That’s a good plan. Secure by default is the only way actually. Secure is often backwards incompatible with insecure.