codeworld
codeworld copied to clipboard
Local instance loads remote Web resources
I can't call this a bug report, because it's obviously intentional, but the CodeWorld home page of a locally installed instance loads many remote resources, including--
- Google APIs (with Google authentication disabled)
- JQuery from jquery.com
- Something from jsdelivr.net
- Something from wurfl.io
- Something from polyfill.io
- (Presumably) icons from materialdesignicons.com
... and stuff from various CDNs probably associated with the above.
I would not expect a local instance to load any code from, or communicate in any way with, anything but its own server, and definitely not without explicit permission from an administrator.
This is something I'd definitely love to fix if someone wants to work on it. There are two different parts:
For jquery.com, jsdelivr.net, materialdesignicons.com, and ajax.googleapis.com, the right solution is to copy the relevant version into third_party, along with any license files, then symlink from web/js to the script in third_party. (#1374 may change some details of this, but that change is not imminent.) Note that the googleapis.com isn't actually related to Google-based login; it's just the canonical location of a CSS file that's part of jquery-ui.
Something different will need to be done for polyfill.io and wurfl.io. These are services that serve different content based on the HTTP client headers, so we're actually relying on server-side logic from these web services. Specifically: wurfl.io is used to detect mobile browsers in order to work around a bug in CodeMirror, and polyfill.io is used to backfill some libraries that are used by the CodeWorld code and aren't implemented in older browsers. If there's a solution to these problems that doesn't use a web service, great. I don't know so much about the JavaScript ecosystem. @Powell-v2 is helping out and is better informed than I am, so perhaps he has some ideas.
WURFL
could be replaced with a simplified check on the client. One plausible solution is to detect whether a device is touchscreen while looking at the screen width at the same time. This should sift out most of the phones / tablets. The question here is: how accurate this detection has to be? What this bug in Codemirror is about, actually? Maybe it's been resolved already? I can see that the lib was added back in 2017.
polyfill.io
could be replaced with a static file. Or, perhaps, we can remove it altogether? Browsers have caught up during past several years and most of the required features are supported natively. There is one notable exception: IE, but Code.World doesn't work there these days anyway due to the ES6 syntax.
The CodeMirror bug was that enabling highlighting for the active line caused mobile browsers (both Android and iOS) to keep inserting blank lines every time a character was typed. You're right that it may be fixed. Someone should check that, and this would be resolved! I wouldn't be happy with guessing whether the browser is mobile based only on screen size or touch screens. This is likely to misidentify a 2-in-1 Chromebook as a mobile device.
I don't know how to gain confidence that polyfill.io isn't needed. I added it, actually, after I ran into a problem in a classroom where the school used Macbooks, which unfortunately need users to manually update Safari, so some of the students were unable to use the site because I was using String.startsWith, or something like that. Apple computers are still common in schools, and some of them are still likely to be out of date. If there's a well-maintained static collection of polyfills we could vendor into third_party, though, that seems like a reasonable option.
Importing JQuery and the like sounds within my capabilities. I could look at that and maybe send you a pull request. I can't offer ongoing maintenance, though.
On polyfill, can I just rip it out if I know that my clients aren't running old versions of Safari? Could that be a command line option or something? I see there's no config file and I suspect you'd like to keep it that way.
You can definitely remove polyfill.io locally, and I suspect it would work for 95% of users. I don't see an easy way to do it as a command line flag, but you're welcome to come up with something.
You can also remove WURFL, if you know your users are not using mobile devices. Just set this line to true:
https://github.com/google/codeworld/blob/7874f9865441cd0714de902522723a52623a2199/web/js/codeworld.js#L281
The CodeMirror bug was that enabling highlighting for the active line caused mobile browsers (both Android and iOS) to keep inserting blank lines every time a character was typed.
I can confirm that this issue doesn't happen on an iPhone running iOS v13 when I set the styleActiveLine
property to true
. I can also check on iPad, if needed.
I wouldn't be happy with guessing whether the browser is mobile based only on screen size or touch screens. This is likely to misidentify a 2-in-1 Chromebook as a mobile device.
Yes, you are right, the line between desktop and mobile has become blurry and some devices could fall through cracks. Another option is to sniff out UA string (option 1, option 2), but that could also yield false results. Do you know what's actually causing this bug? Can I read about it somewhere?
If there's a well-maintained static collection of polyfills we could vendor into third_party, though, that seems like a reasonable option.
Many polyfills are documented on MDN, and I just found an npm package that ties them all together. We can probably compile a list ourselves, if we know what exactly needs to be included. Not sure if that's worth the effort, though. When I get to bundling the code, it will be transpiled anyway.
@Powell-v2 https://github.com/codemirror/CodeMirror/issues/3654 talks about them. I didn't read it in detail.
https://github.com/google/codeworld/issues/131 is the bug on our side.
I've started working on removing references to static remote resources (meaning not polyfill.io). I found a few more here and there. I have some questions/comments:
-
The git commit hook does some linting on JavaScript code (good). The third-party code, including JQuery, doesn't pass and doesn't come close to passing (bad if not surprising). Commits therefore fail. Should I just exclude the third party code from the checks?
-
The approach I've taken so far is to create a directory called "mirrored" under third_party, link it under web, and mirror everything in with the source server name at the top level and the rest of the source URL path replicated underneath. So a reference to "https://code.jquery.com/ui/1.12.0/jquery-ui.min.js" becomes "mirrored/code.jquery.com/ui/1.12.0/jquery-ui.min.js". The files are just pulled down from the Web servers, not built from Git submodules or whatever, even when that would be possible to do.
Is that agreeable? Should I go further and move other stuff under third_party into the same structure? I would not want to move locally modified stuff like codemirror-buttons, but there may be other things that could fit.
-
I've tracked down the licenses for everything and mirrored them into the same structure. In most cases, the source code itself identifies the appropriate license. For the couple of cases where it doesn't, I've identified the licenses in a README file.
-
The documentation embeds some stuff from code.world in iframes. I haven't looked at what's embedded, but I assume it's example programs. I am for the moment leaving that alone. Is that the right thing to do?
-
There are feedback links that encourage users to report issues in the main CodeWorld repository, meaning that bug reports could come in from private instances that are old, broken, and/or misconfigured. I have again left those alone for the moment, but will change them if you prefer.
-
blocks.html pulls in both https://code.jquery.com/jquery-2.2.4.min.js and, a bit further down, https://code.jquery.com/jquery-1.12.4.min.js . As best I can figure out from the JQuery documentation, the second should be detecting that the first has already defined "$" and should either do nothing, or leave "$" alone but redefine "jQuery" (which doesn't seem to be used anywhere else in CodeWorld). If that's not what it does, I don't know what it does do.
So blocks.html and scripts loaded from it should be using JQuery 2.2.4, but everything else should be using 1.12.4. Is that true, and, if so, is it intentional?
JQuery is presently at something like 3.5, by the way.
Answers for you:
- Yes, all of third_party should be excluded from that git commit hook. Sorry that this wasn't already done for you. It should have been.
- I don't think this will work. I believe it's a requirement from the google GitHub org that all vendored code should live in a directory immediately under third_party named after the project, with a license file that applies to it in the same folder. If you create a third_party/mirrored directory, the org's compliance tools will expect all of the code in there to be covered by the same license. (This is from memory. I no longer work for Google, so I cannot check.)
- See 2.
- I'm not sure I understand this question. There are iframes used in the site; the ones I control should be loading
web/doc.html
andweb/run.html
. There might be something else as part of blocks; I'd have to investigate. There's also an iframe created by Google authentication, but it shouldn't be created unless you enable that. - Yes, the feedback links are a separate issue, which will be more complex. Thanks for identifying that. It requires a GitHub API key for an account with write permissions on the repo, so it won't be possible for local installations to make that work. I should probably add some configuration to disable those.
- I don't know why blocks loads both versions, but I would guess that it's just a mistake. If removing the second script tag doesn't break anything with a bit of testing, feel free to do so. Only a sanity check is needed; if it's a subtle issue, I'm happy to fix forward. Blocks isn't really used at the moment anyway.
As far as versions, yes, it's likely that several of the dependencies are very old. I can try a newer jQuery and perhaps update it, but it's probably easier to wait until after you're done?
- No problem; it's trivial to fix it.
- If there's an automated compliance tool, is there documentation on what it expects to find? Sounds like no. Congratulations on your escape from Google, whenever it may have taken place. I assume they own the name...
- See 2. :-)
- There are a bunch of these. I don't know what they do; I found them with the awesome power of grep.
web/help/GuideUnit1.md: <div align="center"><iframe src="https://code.world/run.html?mode=codeworld&dhash=DrYNeySBqPKubPkzT48dTEA" width=250 height=250 style="border: none;"></iframe></div>
- OK
- I'll give it a try. I'm hampered in knowing what breaks because I don't know what worked before.
... and if the dependencies work, it doesn't seem necessary to rush to update them. I'm a security weenie, and I hate to see old versions of anything, but honestly how much access will that code have to anything anybody actually wants to attack? I guess the only reason it could become more urgent even when I'm done is that if, say, JQuery has to pull down a version or do a necro-patch because of some horrible bug, then you won't automatically get the patch.
Another question: should I also exempt the third party code from the pretty printing? It seems valuable to have the file served from the CodeWorld server be bit-for-bit identical to the one that would have been served from the canonical source. And it lets the script integrity checks work without having to worry about changing the hashes if there's ever a reason to switch back, or to switch to some other source.
Yes, I agree that all formatting and linting tools should completely ignore third_party.
More answers:
2: No, there's no externally available documentation on what it expects. There is a script that Google employees can run to check it, and if it's wrong it will occasionally get escalated to someone internally. The internal contact person at Google doesn't actually maintain this project, so I don't want them to be bothered. But this is best effort. If they have to email me and ask me to fix something, no big deal.
Google doesn't own the name, and I could fork it and host elsewhere. So far, I haven't had a reason worth doing that.
4: I see now. Yes, those are demo programs written in CodeWorld itself. Usually they are animations or something that explain complex ideas. Unfortunately, a local instance will not be able to run those programs, so this has to refer to the hosted instance. If this is also objectionable, we could look into replacing them with something like animated GIFs?
6: Best effort is fine for blocks. I don't want to kill it yet or intentionally break it, but I'm not going to prevent development on the main code base just because blocks might break in a subtle way.
A modest proposal on 2 and 3: I don't actually mind having a shot at making the third_party stuff conform to Google's standards by emulating existing code in this and other projects... but a perhaps-easier-to-maintain and perhaps-easier-to-extend alternative might be to have install.sh (or even build.sh) download the code at install time, so it's not distributed as part of CodeWorld at all.
Oh, and on the embedded demos, I don't personally see that as a really big deal.
Those references to code.world are in the documentation, so the system isn't profoundly degraded if they're unavailable. You can still feasibly run an instance on an isolated network. There's only one point of attack for injecting malicious code, and code.world is going to be a lot less attractive as an injection point for somebody who'd want to do that than jquery.com. The privacy exposure is also narrowed.
In a perfect world, it'd be nice to host every scrap of everything locally, but I don't think those particular embeds are worth reducing things to animated GIFs. I also can't see my way clear to suggest you do any significant hackery to include pre-embedded programs in the distribution, if there's no other purpose for it than supporting a few demos in the documentation. If you had some other reason for wanting to include a bunch canned examples, then sure, but not just for this.