pyshark icon indicating copy to clipboard operation
pyshark copied to clipboard

Decoding non english SMS text

Open alphapats opened this issue 3 years ago • 2 comments

I am using pyshark to extract SMS from pcap (captured using gr_gsm).The code is as follows for packet in capture: if (layer == "GSM_SMS"): print(packet.gsm_sms.sms_text) I am getting english text perfectly. However, for non english text and spaces, it is giving garbled content. For eg: SMS text: 'آنا ھے یانہیں SMS کرو ' (as per wireshark) is shown as '\xd8\xa2\xd9\x86\xd8\xa7 \xda\xbe\xdb\x92 \xdb\x8c\xd8\xa7\xd9\x86\xdb\x81\xdb\x8c\xda\xba SMS \xda\xa9\xd8\xb1\xd9\x88' Can you help me to identify encoding and thereby decoding to get non english characters?

alphapats avatar Aug 27 '21 09:08 alphapats

same question

eXcellme avatar Sep 23 '21 10:09 eXcellme

i was able to find workaround to the problem. ` sms_content = str(packet.gsm_sms.sms_text)

try: sms_content=sms_content.encode('ascii','xmlcharrefreplace').decode('unicode-escape','ignore').encode('iso-8859-1','xmlcharrefreplace').decode('utf-8','xmlcharrefreplace') except Exception as e: sms_content = sms_content.replace(r'\xd','\n').replace(r'\xa',' ') `

alphapats avatar Sep 26 '21 05:09 alphapats