ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

mod_mam_sql stores invalid xml

Open marcphilipp opened this issue 7 years ago • 8 comments

What version of ejabberd are you using?

17.09

What operating system (version) are you using?

CentOS

How did you install ejabberd (source, package, distribution)?

source

What did not work as expected? Are there error messages in the log? What was the unexpected behavior? What was the expected result?

When receiving messages with long bodies, mod_mam_sql inserts them into the database archive (MySQL in our case). Both the xml and the txt columns are cut off after 65535 characters. Thus, when loading them in a subsequent MAM query, the xml column contains invalid XML. The max_stanza_size option is set to 65535 on ejabberd_c2s but I'm not sure it's used for the websocket transport.

marcphilipp avatar Jan 08 '18 12:01 marcphilipp

https://stackoverflow.com/questions/6766781/maximum-length-for-mysql-type-text

zinid avatar Jan 08 '18 13:01 zinid

I know that it's the maximum length for a TEXT column. That doesn't solve the problem that such messages are stored in a corrupted way in the database. ejabberd_http_ws does not check max_stanza_size. Thus, I have no way of preventing them from being stored in the first place. Once they are stored the MAM response is wrong. It only sends the messages that were not corrupted, but the count of the fin element in the IQ result includes the corrupted messages.

So, this might actually be two issues:

  1. mod_mam_sql should not store messages that will corrupt the archive table.
  2. ejabberd_http_ws needs a way to limit the maximum stanza size

I'd appreciate any pointer how to workaround or fix the issue.

marcphilipp avatar Jan 08 '18 13:01 marcphilipp

Well, how do you suggest to resolve the first point? We have a ton of databases with their limitations, it's really a burden to check all these stupid restrictions. I think you just need to change the type of the field, or use "right" database: https://www.postgresql.org/docs/current/static/datatype-character.html which doesn't have these silly limitations. Regarding the second point, it's on my todo list to fix it. I will leave the issue opened until I fix it.

zinid avatar Jan 08 '18 13:01 zinid

how do you suggest to resolve the first point?

How about a config option for mod_mam? max_message_size or something along those lines?

I think you just need to change the type of the field, or use "right" database

I think allowing huge messages might cause a security vulnerability so I'd rather not go that route.

Regarding the second point, it's on my todo to fix it. I will leave the issue opened until I fix it.

Sounds good, thanks! 🙂

marcphilipp avatar Jan 08 '18 13:01 marcphilipp

How about a config option for mod_mam? max_message_size or something along those lines?

How about asking MySQL developers to return an error on field overflow instead of truncating? Just like this is done in Postgres (for integer overflow).

zinid avatar Jan 08 '18 14:01 zinid

Fair enough. I'll leave you to it. 😉

marcphilipp avatar Jan 08 '18 14:01 marcphilipp

ejabberd_http_ws not checking max_stanza_size sounds like a worth reporting bug on its own.

dos1 avatar Aug 19 '18 22:08 dos1

@prefiks: What do you think?

Neustradamus avatar Apr 13 '21 13:04 Neustradamus