Facebook-Messenger-Bot
Facebook-Messenger-Bot copied to clipboard
Discord data parsing error
Hi i am having some difficulty while creating the dictionary of my friends and my messages, there seems to be a problem with the regex used n this code
response_sets = re.findall(r'[.+] (?!' + re.escape(personName) + r').+\n(.+)\n{2}(?:[.+] ' + re.escape(personName) + r'\n(.+)\n{2})', data)
this is what has been used but it returns a blank dictionary
[08-Oct-20 02:40 PM] ShadowRanger5#3348 hello
[08-Oct-20 03:00 PM] sai#2795 Hi wassup
this is what my data looks like after i have formatted but using the above regex i cant seem to create a dictionary to extract my friends and my conversations
It is possible that this was made keeping in mind older versions of the discord chats parser among many other things that are a little outdated in this repository (seq2seq model and some code of the word2vec)
would appreciate if anybody can come up with a solution for this
It should work fine, as seen here.
Note that the regex you pasted in the issue is rendered incorrectly, here's the actual regex just for context-
r'\[.+\] (?!' + re.escape(personName) + r').+\n(.+)\n{2}(?:\[.+\] ' + re.escape(personName) + r'\n(.+)\n{2})'
It should be noted that the dataset training works on response sets. This means that the regex captures your (personName
) responses against another person's statements. What this means in practice is that, in your given short snippet - sai#2795
is the responder and hence the regex on that short snippet will only capture a response set when personName
is sai#2795
.
If the text history looked like-
[08-Oct-20 02:40 PM] sai#2795
hello
[08-Oct-20 03:00 PM] ShadowRanger5#3348
Hi wassup
you will now need to use ShadowRanger5#3348
as personName
to get any response sets out of this. Since in this snippet, ShadowRanger5#3348
is the only person responding.
If you're still having trouble, it is highly likely the regex is not the issue - please provide a small (and preferably censored) version of the chatlog you're trying to parse, along with your input for personName
.
it is working but gives only 1 pair of response even though i have approximately 70 dms. Do i need to change the chats format in some way? Please tell what i should do to retrieve the chats
As long as they are in the format of a response - as in, another person's message followed by your message - it should be parsed correctly. Ensure you set personName
correctly when prompted to.
https://imgur.com/a/aeTAZUW this is the kind of data i am working with
also did u guys take into account that one person cud have sent more than one message in one go?
Unfortunately, the training is only capable of working with atomic response sets - that is, one reply to one statement. But as long as there are multiple response sets in your chatlog - it should still work.
Also, the screenshot does not show your inputs to the script - so I'm unsure where the 733 length of dictionary is coming from. Are you exporting messages from multiple sources?
As a side note, please do not post images of debug text data. I cannot really copy text from an image. You may try pasting your chatlog in a regex tester, such as the link I posted before, and checking the matches. (make sure to alter the person name if you need to).
I don't think it's a problem with the regex per se - perhaps there's something I'm missing. But this is the first time this issue has been encountered.
actually the 733 thing occurs coz i am also using whatsapp data which has no problem, it has been extracted successfully and its length is 732 and only 1 has been extracted from discord chats... Even i am confused as to what the problem is.. it works fine on the regex tester but stopd working when implemented