Facebook-Messenger-Bot icon indicating copy to clipboard operation
Facebook-Messenger-Bot copied to clipboard

Discord data parsing error

Open namanko opened this issue 4 years ago • 6 comments

Hi i am having some difficulty while creating the dictionary of my friends and my messages, there seems to be a problem with the regex used n this code

response_sets = re.findall(r'[.+] (?!' + re.escape(personName) + r').+\n(.+)\n{2}(?:[.+] ' + re.escape(personName) + r'\n(.+)\n{2})', data)

this is what has been used but it returns a blank dictionary

[08-Oct-20 02:40 PM] ShadowRanger5#3348 hello

[08-Oct-20 03:00 PM] sai#2795 Hi wassup

this is what my data looks like after i have formatted but using the above regex i cant seem to create a dictionary to extract my friends and my conversations

It is possible that this was made keeping in mind older versions of the discord chats parser among many other things that are a little outdated in this repository (seq2seq model and some code of the word2vec)

would appreciate if anybody can come up with a solution for this

namanko avatar Feb 11 '21 15:02 namanko

It should work fine, as seen here.

Note that the regex you pasted in the issue is rendered incorrectly, here's the actual regex just for context-

r'\[.+\] (?!' + re.escape(personName) + r').+\n(.+)\n{2}(?:\[.+\] ' + re.escape(personName) + r'\n(.+)\n{2})'

It should be noted that the dataset training works on response sets. This means that the regex captures your (personName) responses against another person's statements. What this means in practice is that, in your given short snippet - sai#2795 is the responder and hence the regex on that short snippet will only capture a response set when personName is sai#2795.

If the text history looked like-

[08-Oct-20 02:40 PM] sai#2795
hello

[08-Oct-20 03:00 PM] ShadowRanger5#3348
Hi wassup

you will now need to use ShadowRanger5#3348 as personName to get any response sets out of this. Since in this snippet, ShadowRanger5#3348 is the only person responding.

If you're still having trouble, it is highly likely the regex is not the issue - please provide a small (and preferably censored) version of the chatlog you're trying to parse, along with your input for personName.

TotallyNotChase avatar Feb 11 '21 16:02 TotallyNotChase

it is working but gives only 1 pair of response even though i have approximately 70 dms. Do i need to change the chats format in some way? Please tell what i should do to retrieve the chats

namanko avatar Feb 12 '21 01:02 namanko

As long as they are in the format of a response - as in, another person's message followed by your message - it should be parsed correctly. Ensure you set personName correctly when prompted to.

TotallyNotChase avatar Feb 12 '21 06:02 TotallyNotChase

https://imgur.com/a/aeTAZUW this is the kind of data i am working with

also did u guys take into account that one person cud have sent more than one message in one go?

namanko avatar Feb 12 '21 07:02 namanko

Unfortunately, the training is only capable of working with atomic response sets - that is, one reply to one statement. But as long as there are multiple response sets in your chatlog - it should still work.

Also, the screenshot does not show your inputs to the script - so I'm unsure where the 733 length of dictionary is coming from. Are you exporting messages from multiple sources?

As a side note, please do not post images of debug text data. I cannot really copy text from an image. You may try pasting your chatlog in a regex tester, such as the link I posted before, and checking the matches. (make sure to alter the person name if you need to).

I don't think it's a problem with the regex per se - perhaps there's something I'm missing. But this is the first time this issue has been encountered.

TotallyNotChase avatar Feb 12 '21 07:02 TotallyNotChase

actually the 733 thing occurs coz i am also using whatsapp data which has no problem, it has been extracted successfully and its length is 732 and only 1 has been extracted from discord chats... Even i am confused as to what the problem is.. it works fine on the regex tester but stopd working when implemented

namanko avatar Feb 12 '21 08:02 namanko