pyshark
pyshark copied to clipboard
Decoding non english SMS text
I am using pyshark to extract SMS from pcap (captured using gr_gsm).The code is as follows
for packet in capture: if (layer == "GSM_SMS"): print(packet.gsm_sms.sms_text)
I am getting english text perfectly. However, for non english text and spaces, it is giving garbled content. For eg:
SMS text: 'آنا ھے یانہیں SMS کرو ' (as per wireshark)
is shown as '\xd8\xa2\xd9\x86\xd8\xa7 \xda\xbe\xdb\x92 \xdb\x8c\xd8\xa7\xd9\x86\xdb\x81\xdb\x8c\xda\xba SMS \xda\xa9\xd8\xb1\xd9\x88'
Can you help me to identify encoding and thereby decoding to get non english characters?
same question
i was able to find workaround to the problem. ` sms_content = str(packet.gsm_sms.sms_text)
try: sms_content=sms_content.encode('ascii','xmlcharrefreplace').decode('unicode-escape','ignore').encode('iso-8859-1','xmlcharrefreplace').decode('utf-8','xmlcharrefreplace') except Exception as e: sms_content = sms_content.replace(r'\xd','\n').replace(r'\xa',' ') `