MongooseIM icon indicating copy to clipboard operation
MongooseIM copied to clipboard

How to get actual message xml packet with external code or by direct database query in MAM

Open sandeepjangir opened this issue 2 years ago • 9 comments

MongooseIM version: 4.0.0 Installed from: source Erlang/OTP version: build with source code

We are using MAM with Cassandra database, and want to extract the message xml packet from nodejs code. the message column in Cassandra table is into binary data.

How we can get actual message packet with external code or by direct database query?

sandeepjangir avatar Sep 12 '23 14:09 sandeepjangir

Hi,

Ideally you want to use

modules.mod_mam.db_message_format = "mam_message_xml"

It will write XML as XML into the DB.

Also, you could wanna tweak db_jid_format option too.

https://esl.github.io/MongooseDocs/6.0.0/modules/mod_mam/#modulesmod_mamdb_message_format

But changing it on the fly would not work (i.e. you need to start with an empty archive, all messages should be in one format, two formats would cause errors).

But if you have binary format in DB, it is probably

modules.mod_mam.db_message_format = "mam_message_compressed_eterm"

Which is Erlang External Term format https://www.erlang.org/doc/apps/erts/erl_ext_dist.html So, there are two easy steps you need to do:

  • decode Erlang External Term format
  • write an encoder from erlang exml stanza format to XML (we use #xmlel{} and #xmlcdata{} records). Kinda tricky, basically you need the javascript version of this Erlang function https://github.com/esl/exml/blob/master/src/exml.erl#L84

You can use some library to decode External format:

https://www.npmjs.com/package/erlang_js or https://github.com/mweibel/node-etf/blob/master/README.md (there could be more libraries). (oh, I don't know if the libs can read compressed format, you probably would need to patch them in this case. But Erlang is using zlib to compress the erlang terms).

arcusfelis avatar Sep 12 '23 14:09 arcusfelis

Or you can use graphql API to ask MongooseIM to extract messages in the reasonable format.

arcusfelis avatar Sep 12 '23 14:09 arcusfelis

modules.mod_mam.db_message_format = "mam_message_xml"

This is not available in MIM 4 version, any solution specific to MIM4

sandeepjangir avatar Sep 12 '23 15:09 sandeepjangir

it is in mim 4.0.0. Module mod_mam_cassandra_arch:

expand_simple_param(Params) ->
    lists:flatmap(fun(simple) -> simple_params();
                     ({simple, true}) -> simple_params();
                     (Param) -> [Param]
                  end, Params).

simple_params() ->
    [{db_message_format, mam_message_xml}].

So, provide {simple, true} to that mod_mam_cassandra_arch.

How do you configure MAM? Do you have any messages in Cassandra already?

arcusfelis avatar Sep 12 '23 19:09 arcusfelis

We are using MAM already with Cassandra, here is the config detail:

[modules.mod_mam_meta] backend = "cassandra" archive_chat_markers = true pm.user_prefs_store = "mnesia"

sandeepjangir avatar Sep 12 '23 20:09 sandeepjangir

  1. I have empty the mam_message Cassandra table
  2. updated below config for MAM in mongooseim.toml
  3. Restart the MIM server
[modules.mod_mam_meta]
  backend = "cassandra"
  archive_chat_markers = true
  db_message_format = "mam_message_xml"
  pm.user_prefs_store = "rdbms"

After doing all the setup, I'm still getting binary data in table, here is the screenshot, PFA.

Screenshot 2023-09-13 at 2 38 19 PM

sandeepjangir avatar Sep 13 '23 09:09 sandeepjangir

@sandeepjangir It is

[16#3c, 16#6d, 16#65, 16#73].
"<mes"

Use

select blobastext(message) from mam_message limit 1;

arcusfelis avatar Sep 13 '23 10:09 arcusfelis

Thanks for the details @arcusfelis , I can see the raw xml now.

can you also guide me to implement a feature where I can fetch a message from message id (the message xml packet id).

MAM message table, the id is stored a unique integer value that doesn't get in message packet.

In short we need to modify message packet based on message id.

Thanks in advance

sandeepjangir avatar Sep 13 '23 14:09 sandeepjangir

@sandeepjangir that is not possible. Generally, user-generated ids are treated with a grain of salt, because it is too easy to spoof. You can add message_id into schema and add a DB index, would require code patching in mam cassandra module.

id in schema is a MAM id, encoded as an integer. If you have MAM id, you can find the message in the DB though.

arcusfelis avatar Sep 14 '23 09:09 arcusfelis