html5ever icon indicating copy to clipboard operation
html5ever copied to clipboard

Adoption agency limit

Open DemiMarie opened this issue 8 years ago • 21 comments

This hardens the code against denial of service attacks by only going back a certain number of elements (set at parser construction time) in the list of active formatting elements and the stack of open elements. With this change, I have not been able to find any ways to cause html5ever to hang, even on pathological input.

This is a breaking change, since it adds an element to TreeBuilderOpts and causes non-conforming parsing of very deeply nested input. It is intended that the parsing of valid input should not be affected.

DemiMarie avatar Aug 06 '17 00:08 DemiMarie

My understanding is that it is impossible to solve #289 without imposing some arbitrary limit – otherwise, it is possible to produce DOMs that are quadratic size.

DemiMarie avatar Aug 07 '17 23:08 DemiMarie

:umbrella: The latest upstream changes (presumably #274) made this pull request unmergeable. Please resolve the merge conflicts.

bors-servo avatar Aug 09 '17 13:08 bors-servo

@DemiMarie #274 moved implentations from html5ever/src/tree_builder/actions.rs into html5ever/src/tree_builder/mod.rs, which broke this change. You will need to rebase.

Ygg01 avatar Aug 09 '17 14:08 Ygg01

@Ygg01 Rebased and squashed.

I went back to a limit of 10, because I doubt that most web pages will hit that limit, and because the limit*8 is roughly the “amplification factor” that a malicious document can achieve. This does cause some tests to fail though.

Part of this can be solved by better algorithms, but part of it is inherent to (and IMO a bug in) the HTML parsing algorithm, which allows for quadratically-sized DOMs to be created from linearly-sized input.

DemiMarie avatar Aug 10 '17 22:08 DemiMarie

Hm. It seems your code is causing some errors in tree building. Or perhaps testing requires larger limit? I'll inspect once I'm home.

Ygg01 avatar Aug 11 '17 15:08 Ygg01

Testing requires a larger limit. I chose the limit I did because the code is already slow and I didn't want for an attacker to be able to DoS a server.

I think that what is really needed is a better data structure. At some point, PRs should be issued against the spec that replace the algorithm with an equivalent but faster version.

On Aug 11, 2017 11:10 AM, "Ygg01" [email protected] wrote:

Hm. It seems your code is causing some errors in tree building. Or perhaps testing requires larger limit? I'll inspect once I'm home.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/servo/html5ever/pull/297#issuecomment-321839903, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWBye-juni9Meu0EsdzZaTst2O24VMks5sXG8AgaJpZM4Oumly .

DemiMarie avatar Aug 11 '17 15:08 DemiMarie

I think that what is really needed is a better data structure. At some point,

Are there any examples what said data structure is?

Ygg01 avatar Aug 11 '17 16:08 Ygg01

@Ygg01 Testing requires a larger limit. I chose the limit I did because the code is already slow and I didn't want for an attacker to be able to DoS a server.

I think that what is really needed is a better data structure. At some point, PRs should be issued against the spec that replace the algorithm with an equivalent but faster version. The algorithm in the spec should be one that people can actually use securely.

DemiMarie avatar Aug 11 '17 16:08 DemiMarie

  • Use some sort of hash table for the list of active formatting elements, to allow keeping track of the “max. 3 duplicates” rule.
  • Keep flags that can be checked, and which are updated whenever something is added or removed, to avoid traversing lists a bunch.

On Aug 11, 2017 12:26 PM, "Ygg01" [email protected] wrote:

I think that what is really needed is a better data structure. At some point,

Are there any examples what said data structure is?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/servo/html5ever/pull/297#issuecomment-321858143, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB9wqi-xi3XP4qggVgGwT-9rySVTEks5sXIChgaJpZM4Oumly .

DemiMarie avatar Aug 12 '17 12:08 DemiMarie

Any updates on this PR? (having HTML parser, which does not blow on pathological pages would be really nice).

usamec avatar Oct 18 '17 08:10 usamec

@hsivonen, what does Gecko do here?

SimonSapin avatar Oct 18 '17 08:10 SimonSapin

Tried this PR with the latest version and it doesn't fix stack overflow issue during deallocation phase.

S2Ler avatar Aug 20 '18 09:08 S2Ler

I'm so bad at GitHub notifications. Sorry.

@SimonSapin, it's a bit unclear to me what "here" means. The adoption agency algorithm has its outer loop count limited to 8 in the spec, and that's what Gecko limits it to.

Additionally, there's a need to limit the general DOM depth. On that point, you shouldn't copy Gecko but Blink. Please take a look at the Gecko patch to match Blink. It hasn't landed due to the sad combination of Android runtime stack limits and Robohornet. I'm hoping to resolve the Android issue one way or another in 2018H2.

hsivonen avatar Aug 21 '18 12:08 hsivonen

@hsivonen Could you explain the gist behind the Blink's limit, in that patch, My C++ foo isn't great. From what I could tell. It creates a limited Vec or array that has up to 512 (or rather 511) elements and errors and returns last element of said array.

Ygg01 avatar Dec 17 '18 16:12 Ygg01

:umbrella: The latest upstream changes (presumably #357) made this pull request unmergeable. Please resolve the merge conflicts.

bors-servo avatar Dec 17 '18 23:12 bors-servo

Could you explain the gist behind the Blink's limit, in that patch, My C++ foo isn't great.

ReviewBoard went away, but here is rebased version of the Gecko patch.

In places where the code calls nodeFromStackWithBlinkCompat(), instead of taking the most-recently-pushed element from the tree builder stack, element at index 511 is taken if there are more elements on the stack. Assumes zero-based indexing where the "html" element is at index 0.

hsivonen avatar Dec 20 '18 09:12 hsivonen

@DemiMarie could you work in the changes hsivonen mentioned?

Ygg01 avatar Dec 24 '18 16:12 Ygg01

@hsivonen I am working on it.

DemiMarie avatar Dec 26 '18 13:12 DemiMarie

The fix has finally landed in Gecko and is available for testing in Firefox Nightly.

hsivonen avatar Jan 14 '19 10:01 hsivonen

@DemiMarie Are you stil working on this?

freddyb avatar Jul 29 '19 19:07 freddyb