pyshark Decoding non english SMS text

Decoding non english SMS text

Open alphapats opened this issue 3 years ago • 2 comments

I am using pyshark to extract SMS from pcap (captured using gr_gsm).The code is as follows for packet in capture: if (layer == "GSM_SMS"): print(packet.gsm_sms.sms_text) I am getting english text perfectly. However, for non english text and spaces, it is giving garbled content. For eg: SMS text: 'آنا ھے یانہیں SMS کرو ' (as per wireshark) is shown as '\xd8\xa2\xd9\x86\xd8\xa7 \xda\xbe\xdb\x92 \xdb\x8c\xd8\xa7\xd9\x86\xdb\x81\xdb\x8c\xda\xba SMS \xda\xa9\xd8\xb1\xd9\x88' Can you help me to identify encoding and thereby decoding to get non english characters?

Aug 27 '21 09:08 alphapats

same question

Sep 23 '21 10:09 eXcellme

i was able to find workaround to the problem. ` sms_content = str(packet.gsm_sms.sms_text)

try: sms_content=sms_content.encode('ascii','xmlcharrefreplace').decode('unicode-escape','ignore').encode('iso-8859-1','xmlcharrefreplace').decode('utf-8','xmlcharrefreplace') except Exception as e: sms_content = sms_content.replace(r'\xd','\n').replace(r'\xa',' ') `

Sep 26 '21 05:09 alphapats

pyshark pyshark copied to clipboard

Decoding non english SMS text

pyshark
pyshark copied to clipboard