IPED
IPED copied to clipboard
Discord cache files parser
Solves Issue #390
First version of the Discord parser
Thanks @felipecampanini! I'll try to review next week.
I can anticipate a lot of InputStreams are not being closed.
I pushed the wrong branch by mistake, sorry. I had to force push to clean the commit history.
@felipecampanini is the "index" file processed by DiscordParser also located in "AppData/Roaming/discord/cache" folder? If not, is it located in a specific folder? This info would be very useful to us to try to locate samples to test this PR.
@felipecampanini I did assorted fixes and some improvements. As I'm new to this file format, please check the changes to see if I have broken something.
The following things still should be addressed:
- [ ] some exceptions are being thrown with some of my test cases here, I don't know if they are expected or not. I can send you my samples for testing;
- [ ] extract individual messages as subitems to populate the graph tab and the event timeline, like done by other chat parsers (whatsapp, skype...);
- [x] I think signature pattern is matching all Chrome cache index files, but just discord data is being parsed and remaining data is being totally ignored, it used to be indexed generically using the string extractor. Maybe a custom detector could be implemented to differentiate them, I just started to think about this...
Let me know if you (will) have time to work on the first two things listed above.
I saw some attachments are download on demand from internet servers if the chat html is opened outside the application, internal viewer blocks this access, not sure if this should be allowed (could be something malicious). Any dev has an opinion about this?
And I understood attachments aren't searched for in the case. Could they exist in cache folder and could the link from chat to them be built from index/data files? If yes, I think they should be searched for and embedded in chat html, with proper checkboxes to be selected in case, this would be better than downloading from internet, if possible...
Thanks.
- I think signature pattern is matching all Chrome cache index files, but just discord data is being parsed and remaining data is being totally ignored, it used to be indexed generically using the string extractor. Maybe a custom detector could be implemented to differentiate them, I just started to think about this...
@felipecampanini I've just pushed another approach to handle above in 78f5cefef3218cfcdc2a4f25ee33906599d3e6a7. Now I'll address other features, until remaining TODOs are resolved.
@lfcnassif I have time to work on the first two items.
Attachments are not searched in this case, I believe that not all files are stored in the cache directory, if internet access is blocked, some items will be missing (I'll check this with some tests yet).
@lfcnassif As I had suspected, some attachments are not present in the cache folder.
I'm already changing the code to search for the available files directly from the "external files", which are the files in the cache folder starting with the characters "f_".
However, I couldn't come to a conclusion about how the procedure should be for files that are still available on the servers (files that can be obtained by download) but are not in the cache folder.
Could I get them while processing the case?
Well, I'm not sure if we should start doing this. If yes, maybe ask for an user explicit permission, maybe a configuration option or/and warn to console. WhatsAppParser could also benefit of this. What other devs think? @tc-wleite @hauck-jvsh @fmpfeifer
I have doubts about legal issues, if it asks for an explicit confirmation every time I think it could be done. If you think this could be done let me known , as I have made a java application that pulls out WhatsApp's attachments if they are still available on its server.
Maybe a new --(get|download)(Internet|External|Cloud)(Data|Resources) command line option is enough, so users will need to explicitly enable it, and it won't break batch processing. I vote for --downloadInternetData
Well, I'm not sure if we should start doing this. If yes, maybe ask for an user explicit permission, maybe a configuration option or/and warn to console. WhatsAppParser could also benefit of this. What other devs think? @tc-wleite @hauck-jvsh @fmpfeifer
In general, I think it may have legal issues, as those files are not actually part of the evidence being analysed, but there are references to them. On the other hand, they may provide very useful information, as long the user knows what is happening.
Maybe a new --(get|download)(Internet|External|Cloud)(Data|Resources) command line option is enough, so users will need to explicitly enable it, and it won't break batch processing. I vote for --downloadInternetData
One more vote to that option :-)
One suggestion, sorry if this was already discussed, that applies to Discord, WhatsApp or any parser that enriches its output with online data: somehow (visually) differentiate downloaded files from the ones already present in the processed evidences.
One suggestion, sorry if this was already discussed, that applies to Discord, WhatsApp or any parser that enriches its output with online data: somehow (visually) differentiate downloaded files from the ones already present in the processed evidences.
+1
I think recent versions of Skype could also benefit from downloading data from Internet servers
--downloadInternetData
or --getInternetData
Maybe a new --(get|download)(Internet|External|Cloud)(Data|Resources) command line option is enough, so users will need to explicitly enable it, and it won't break batch processing. I vote for --downloadInternetData
I also vote for this option!
I also vote for this option!
Ok I'll expose such parameter soon, I'm out of office in the next 2 days. But this could be already implemented in parsers using an internal boolean attribute to enable/disable downloading Internet data and I can implement the logic to set that parameter later.
I think that I will wait until #758 is finished, this will avoid unnecessary conflicts. After that I can start integrate my code to IPED.
Just one more thing, I think that the files recovered from the internet should have a message saying that they were recovered from the internet in the chat. Maybe also a metadata to be possible to filter them from the files presented in the evidence. What do you think? @felipecampanini @lfcnassif @tc-wleite
Just one more thing, I think that the files recovered from the internet should have a message saying that they were recovered from the internet in the chat. Maybe also a metadata to be possible to filter them from the files presented in the evidence. What do you think? @felipecampanini @lfcnassif @tc-wleite
I also think it
Sorry for the long delay here, I was working on other hundreds of tickets targeted for 4.0.0...
For the one that is going to finish this work (fix some exceptions while parsing, extract single messages to populate the graph and the timeline, download attachments available in servers), now we have the new --downloadInternetData cmd line option. For the download attachments part, the same approach used to download WhatsApp attachments implemented in #828 could be used as example.
thanks for the tip @lfcnassif, I have some changes to commit, but it's not complete yet. I should complete it in the next few days, it ended up delaying a lot because of other demands from the sector. I will try to send the code in the same approach as WhatsApp attachments.
Thanks @felipecampanini for last commits! I'll try to review in the next days. Just to confirm, you don't have more commits to push, right?
Thanks @felipecampanini for last commits! I'll try to review in the next days. Just to confirm, you don't have more commits to push, right?
Yes, I don't have any more commits to send. I'm working on the function to download the files from the internet, but I'm not finished yet. I remain at your disposal for any improvement or correction that may be necessary.
Thank you!
Hi @felipecampanini,
I'm really really sorry for the long delay to review/test this since your last commit. I've just merged master and fixed some merge conflicts. After testing with my local discord dataset, I identified some non implemented features existent in all other chat parsers (WhatsAppParser, TelegramParser, SkypeParser, UFEDChatParser, they could be used as example), users could be unhappy and I think features below should be implemented to make behavior consistent:
- Individual extracted messages are not populating Message-From and Message-To, so the Graph is not being populated
- ExtraProperties.PARTICIPANTS metadata of the generated Chat item should be populated with all chat parties
- Context menu "go to parent chat position" option should work when clicking on an individual extracted message in "Instant Messages" category
- When attachments are found in the case:
- The onClick event should open the case item from the embedded viewer
- A checkbox should be added in the chat html for each item found to check/uncheck the item in the case from the viewer
- ExtraProperties.LINKED_ITEMS metadata of the generated Chat item should be populated, so attachments will be exported to reports together with their parent chat
- ExtraProperties.SHARED_ITEMS metadata of the generated Chat item should be populated, so P2PBoobkmarker class could be updated to create automatic bookmarks of sent/shared media
- Hashes of those items should be checked using ChildPornHashLookup class and a proper message should be printed in the Chat Html and individual messages should be flagged properly
Unfortunately I won't have time to work on above in the next 2/3 weeks, since I'm working hard to finish the 4.0.0 release, sorry about that...
I also got a bunch of exceptions printed in log while processing, I collected some of them:
java.io.EOFException: Length to read: 160 actual: 0
at org.apache.commons.io.IOUtils.readFully(IOUtils.java:1826)
at org.apache.commons.io.IOUtils.readFully(IOUtils.java:1846)
at dpf.sp.gpinf.discord.cache.CacheEntry.<init>(CacheEntry.java:144)
at dpf.sp.gpinf.discord.cache.Index.<init>(Index.java:200)
at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:89)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
com.fasterxml.jackson.core.JsonParseException: Unexpected character ((CTRL-CHAR, code 131)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: (BufferedInputStream); line: 1, column: 2]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:659)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2737)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:902)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:794)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4761)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4667)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3674)
at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:101)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 3)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: (BufferedInputStream); line: 1, column: 2]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
at com.fasterxml.jackson.core.base.ParserMinimalBase._throwInvalidSpace(ParserMinimalBase.java:713)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3057)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:756)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4761)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4667)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3674)
at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:101)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
dpf.sp.gpinf.discord.cache.CacheAddr$InputStreamNotAvailable: Cannot open InputStream for this CacheAddr.
at dpf.sp.gpinf.discord.cache.CacheAddr.getInputStream(CacheAddr.java:127)
at dpf.sp.gpinf.discord.cache.CacheEntry.getResponseRawDataStream(CacheEntry.java:95)
at dpf.sp.gpinf.discord.cache.CacheEntry.getResponseDataStream(CacheEntry.java:180)
at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:97)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type `java.util.ArrayList<dpf.sp.gpinf.discord.json.DiscordRoot>` from Object value (token `JsonToken.START_OBJECT`)
at [Source: (BufferedInputStream); line: 1, column: 1]
at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
at com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1741)
at com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1515)
at com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1462)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.handleNonArray(CollectionDeserializer.java:392)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:252)
at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3674)
at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:101)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
java.lang.NullPointerException
at dpf.sp.gpinf.discord.DiscordParser.extractMessages(DiscordParser.java:183)
at dpf.sp.gpinf.discord.DiscordParser.parse(DiscordParser.java:151)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at dpf.sp.gpinf.indexer.parsers.IndexerDefaultParser.parse(IndexerDefaultParser.java:246)
at dpf.sp.gpinf.indexer.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
If you have time to take a look at them, I can provide my samples for testing, thank you.
Hi @felipecampanini, are you willing to take a look and try to fix the exceptions above? If yes, I can resolve the merge conflicts and, eventually, implement the other remaining features.
Hi @felipecampanini, are you willing to take a look and try to fix the exceptions above? If yes, I can resolve the merge conflicts and, eventually, implement the other remaining features.
Hi @lfcnassif, sorry for the time without replying. I will implement the remaining features and fix the exceptions. I also have other corrections to send. You had already given me some test samples, if you have new samples, could you please send them to me? Thanks.
Hi @lfcnassif, sorry for the time without replying. I will implement the remaining features and fix the exceptions. I also have other corrections to send. You had already given me some test samples, if you have new samples, could you please send them to me? Thanks.
Thanks @felipecampanini for replying. So I'll try to resolve the merge conflicts. I didn't collect more samples, so I think you already have mine.