IPED icon indicating copy to clipboard operation
IPED copied to clipboard

Discord cache files parser

Open felipecampanini opened this issue 3 years ago • 38 comments

Solves Issue #390

First version of the Discord parser

felipecampanini avatar Aug 06 '21 21:08 felipecampanini

Thanks @felipecampanini! I'll try to review next week.

lfcnassif avatar Aug 06 '21 21:08 lfcnassif

I can anticipate a lot of InputStreams are not being closed.

lfcnassif avatar Aug 06 '21 21:08 lfcnassif

I pushed the wrong branch by mistake, sorry. I had to force push to clean the commit history.

lfcnassif avatar Aug 11 '21 22:08 lfcnassif

@felipecampanini is the "index" file processed by DiscordParser also located in "AppData/Roaming/discord/cache" folder? If not, is it located in a specific folder? This info would be very useful to us to try to locate samples to test this PR.

lfcnassif avatar Aug 16 '21 21:08 lfcnassif

@felipecampanini I did assorted fixes and some improvements. As I'm new to this file format, please check the changes to see if I have broken something.

The following things still should be addressed:

  • [ ] some exceptions are being thrown with some of my test cases here, I don't know if they are expected or not. I can send you my samples for testing;
  • [ ] extract individual messages as subitems to populate the graph tab and the event timeline, like done by other chat parsers (whatsapp, skype...);
  • [x] I think signature pattern is matching all Chrome cache index files, but just discord data is being parsed and remaining data is being totally ignored, it used to be indexed generically using the string extractor. Maybe a custom detector could be implemented to differentiate them, I just started to think about this...

Let me know if you (will) have time to work on the first two things listed above.

I saw some attachments are download on demand from internet servers if the chat html is opened outside the application, internal viewer blocks this access, not sure if this should be allowed (could be something malicious). Any dev has an opinion about this?

And I understood attachments aren't searched for in the case. Could they exist in cache folder and could the link from chat to them be built from index/data files? If yes, I think they should be searched for and embedded in chat html, with proper checkboxes to be selected in case, this would be better than downloading from internet, if possible...

Thanks.

lfcnassif avatar Aug 27 '21 22:08 lfcnassif

  • I think signature pattern is matching all Chrome cache index files, but just discord data is being parsed and remaining data is being totally ignored, it used to be indexed generically using the string extractor. Maybe a custom detector could be implemented to differentiate them, I just started to think about this...

@felipecampanini I've just pushed another approach to handle above in 78f5cefef3218cfcdc2a4f25ee33906599d3e6a7. Now I'll address other features, until remaining TODOs are resolved.

lfcnassif avatar Aug 30 '21 13:08 lfcnassif

@lfcnassif I have time to work on the first two items.

Attachments are not searched in this case, I believe that not all files are stored in the cache directory, if internet access is blocked, some items will be missing (I'll check this with some tests yet).

felipecampanini avatar Sep 13 '21 14:09 felipecampanini

@lfcnassif As I had suspected, some attachments are not present in the cache folder.

I'm already changing the code to search for the available files directly from the "external files", which are the files in the cache folder starting with the characters "f_".

However, I couldn't come to a conclusion about how the procedure should be for files that are still available on the servers (files that can be obtained by download) but are not in the cache folder.

Could I get them while processing the case?

felipecampanini avatar Sep 16 '21 23:09 felipecampanini

Well, I'm not sure if we should start doing this. If yes, maybe ask for an user explicit permission, maybe a configuration option or/and warn to console. WhatsAppParser could also benefit of this. What other devs think? @tc-wleite @hauck-jvsh @fmpfeifer

lfcnassif avatar Sep 17 '21 00:09 lfcnassif

I have doubts about legal issues, if it asks for an explicit confirmation every time I think it could be done. If you think this could be done let me known , as I have made a java application that pulls out WhatsApp's attachments if they are still available on its server.

hauck-jvsh avatar Sep 17 '21 19:09 hauck-jvsh

Maybe a new --(get|download)(Internet|External|Cloud)(Data|Resources) command line option is enough, so users will need to explicitly enable it, and it won't break batch processing. I vote for --downloadInternetData

lfcnassif avatar Sep 17 '21 20:09 lfcnassif

Well, I'm not sure if we should start doing this. If yes, maybe ask for an user explicit permission, maybe a configuration option or/and warn to console. WhatsAppParser could also benefit of this. What other devs think? @tc-wleite @hauck-jvsh @fmpfeifer

In general, I think it may have legal issues, as those files are not actually part of the evidence being analysed, but there are references to them. On the other hand, they may provide very useful information, as long the user knows what is happening.

Maybe a new --(get|download)(Internet|External|Cloud)(Data|Resources) command line option is enough, so users will need to explicitly enable it, and it won't break batch processing. I vote for --downloadInternetData

One more vote to that option :-)

One suggestion, sorry if this was already discussed, that applies to Discord, WhatsApp or any parser that enriches its output with online data: somehow (visually) differentiate downloaded files from the ones already present in the processed evidences.

wladimirleite avatar Sep 17 '21 20:09 wladimirleite

One suggestion, sorry if this was already discussed, that applies to Discord, WhatsApp or any parser that enriches its output with online data: somehow (visually) differentiate downloaded files from the ones already present in the processed evidences.

+1

lfcnassif avatar Sep 17 '21 20:09 lfcnassif

I think recent versions of Skype could also benefit from downloading data from Internet servers

lfcnassif avatar Sep 17 '21 20:09 lfcnassif

--downloadInternetData

or --getInternetData

lfcnassif avatar Sep 17 '21 20:09 lfcnassif

Maybe a new --(get|download)(Internet|External|Cloud)(Data|Resources) command line option is enough, so users will need to explicitly enable it, and it won't break batch processing. I vote for --downloadInternetData

I also vote for this option!

felipecampanini avatar Sep 19 '21 18:09 felipecampanini

I also vote for this option!

Ok I'll expose such parameter soon, I'm out of office in the next 2 days. But this could be already implemented in parsers using an internal boolean attribute to enable/disable downloading Internet data and I can implement the logic to set that parameter later.

lfcnassif avatar Sep 20 '21 14:09 lfcnassif

I think that I will wait until #758 is finished, this will avoid unnecessary conflicts. After that I can start integrate my code to IPED.

hauck-jvsh avatar Sep 20 '21 17:09 hauck-jvsh

Just one more thing, I think that the files recovered from the internet should have a message saying that they were recovered from the internet in the chat. Maybe also a metadata to be possible to filter them from the files presented in the evidence. What do you think? @felipecampanini @lfcnassif @tc-wleite

hauck-jvsh avatar Sep 21 '21 12:09 hauck-jvsh

Just one more thing, I think that the files recovered from the internet should have a message saying that they were recovered from the internet in the chat. Maybe also a metadata to be possible to filter them from the files presented in the evidence. What do you think? @felipecampanini @lfcnassif @tc-wleite

I also think it

felipecampanini avatar Sep 23 '21 20:09 felipecampanini

Sorry for the long delay here, I was working on other hundreds of tickets targeted for 4.0.0...

For the one that is going to finish this work (fix some exceptions while parsing, extract single messages to populate the graph and the timeline, download attachments available in servers), now we have the new --downloadInternetData cmd line option. For the download attachments part, the same approach used to download WhatsApp attachments implemented in #828 could be used as example.

lfcnassif avatar Mar 18 '22 19:03 lfcnassif

thanks for the tip @lfcnassif, I have some changes to commit, but it's not complete yet. I should complete it in the next few days, it ended up delaying a lot because of other demands from the sector. I will try to send the code in the same approach as WhatsApp attachments.

felipecampanini avatar Mar 18 '22 21:03 felipecampanini

Thanks @felipecampanini for last commits! I'll try to review in the next days. Just to confirm, you don't have more commits to push, right?

lfcnassif avatar Apr 15 '22 14:04 lfcnassif

Thanks @felipecampanini for last commits! I'll try to review in the next days. Just to confirm, you don't have more commits to push, right?

Yes, I don't have any more commits to send. I'm working on the function to download the files from the internet, but I'm not finished yet. I remain at your disposal for any improvement or correction that may be necessary.

felipecampanini avatar Apr 19 '22 19:04 felipecampanini

Thank you!

lfcnassif avatar Apr 19 '22 21:04 lfcnassif

Hi @felipecampanini,

I'm really really sorry for the long delay to review/test this since your last commit. I've just merged master and fixed some merge conflicts. After testing with my local discord dataset, I identified some non implemented features existent in all other chat parsers (WhatsAppParser, TelegramParser, SkypeParser, UFEDChatParser, they could be used as example), users could be unhappy and I think features below should be implemented to make behavior consistent:

  • Individual extracted messages are not populating Message-From and Message-To, so the Graph is not being populated
  • ExtraProperties.PARTICIPANTS metadata of the generated Chat item should be populated with all chat parties
  • Context menu "go to parent chat position" option should work when clicking on an individual extracted message in "Instant Messages" category
  • When attachments are found in the case:
    • The onClick event should open the case item from the embedded viewer
    • A checkbox should be added in the chat html for each item found to check/uncheck the item in the case from the viewer
    • ExtraProperties.LINKED_ITEMS metadata of the generated Chat item should be populated, so attachments will be exported to reports together with their parent chat
    • ExtraProperties.SHARED_ITEMS metadata of the generated Chat item should be populated, so P2PBoobkmarker class could be updated to create automatic bookmarks of sent/shared media
    • Hashes of those items should be checked using ChildPornHashLookup class and a proper message should be printed in the Chat Html and individual messages should be flagged properly

Unfortunately I won't have time to work on above in the next 2/3 weeks, since I'm working hard to finish the 4.0.0 release, sorry about that...

lfcnassif avatar Jun 01 '22 15:06 lfcnassif

I also got a bunch of exceptions printed in log while processing, I collected some of them:

java.io.EOFException: Length to read: 160 actual: 0
	at org.apache.commons.io.IOUtils.readFully(IOUtils.java:1826)
	at org.apache.commons.io.IOUtils.readFully(IOUtils.java:1846)
	at dpf.sp.gpinf.discord.cache.CacheEntry.<init>(CacheEntry.java:144)
	at dpf.sp.gpinf.discord.cache.Index.<init>(Index.java:200)
	at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:89)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
	at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
com.fasterxml.jackson.core.JsonParseException: Unexpected character ((CTRL-CHAR, code 131)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
 at [Source: (BufferedInputStream); line: 1, column: 2]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:659)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2737)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:902)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:794)
	at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4761)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4667)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3674)
	at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:101)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
	at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 3)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: (BufferedInputStream); line: 1, column: 2]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._throwInvalidSpace(ParserMinimalBase.java:713)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3057)
	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:756)
	at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4761)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4667)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3674)
	at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:101)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
	at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
dpf.sp.gpinf.discord.cache.CacheAddr$InputStreamNotAvailable: Cannot open InputStream for this CacheAddr.
	at dpf.sp.gpinf.discord.cache.CacheAddr.getInputStream(CacheAddr.java:127)
	at dpf.sp.gpinf.discord.cache.CacheEntry.getResponseRawDataStream(CacheEntry.java:95)
	at dpf.sp.gpinf.discord.cache.CacheEntry.getResponseDataStream(CacheEntry.java:180)
	at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:97)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
	at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type `java.util.ArrayList<dpf.sp.gpinf.discord.json.DiscordRoot>` from Object value (token `JsonToken.START_OBJECT`)
 at [Source: (BufferedInputStream); line: 1, column: 1]
	at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
	at com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1741)
	at com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1515)
	at com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1462)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.handleNonArray(CollectionDeserializer.java:392)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:252)
	at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3674)
	at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:101)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
	at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
java.lang.NullPointerException
	at dpf.sp.gpinf.discord.DiscordParser.extractMessages(DiscordParser.java:183)
	at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:151)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
	at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

If you have time to take a look at them, I can provide my samples for testing, thank you.

lfcnassif avatar Jun 01 '22 17:06 lfcnassif

Hi @felipecampanini, are you willing to take a look and try to fix the exceptions above? If yes, I can resolve the merge conflicts and, eventually, implement the other remaining features.

lfcnassif avatar Aug 11 '22 20:08 lfcnassif

Hi @felipecampanini, are you willing to take a look and try to fix the exceptions above? If yes, I can resolve the merge conflicts and, eventually, implement the other remaining features.

Hi @lfcnassif, sorry for the time without replying. I will implement the remaining features and fix the exceptions. I also have other corrections to send. You had already given me some test samples, if you have new samples, could you please send them to me? Thanks.

felipecampanini avatar Aug 17 '22 13:08 felipecampanini

Hi @lfcnassif, sorry for the time without replying. I will implement the remaining features and fix the exceptions. I also have other corrections to send. You had already given me some test samples, if you have new samples, could you please send them to me? Thanks.

Thanks @felipecampanini for replying. So I'll try to resolve the merge conflicts. I didn't collect more samples, so I think you already have mine.

lfcnassif avatar Aug 17 '22 14:08 lfcnassif