pycscope icon indicating copy to clipboard operation
pycscope copied to clipboard

skip any non-ASCII characters in the files

Open thread13 opened this issue 8 years ago • 2 comments

… since this seems to break the source code parser ; certainly the preferred way to go would be to convince the parser to accept non-ASCII characters ( Python does that, for one thing ), but as a quick fix:

--- __init__.py.orig    2016-06-04 20:24:26.507343246 +1000
+++ __init__.py 2016-06-04 20:39:29.206238307 +1000
@@ -23,6 +23,19 @@
 import getopt, sys, os, string, re
 import keyword, parser, symbol, token

+_re_ascii_filter = '[^%s]' % (re.escape(string.printable), )
+
+def ascii_dammit( sourcecode, _re_expr = re.compile( _re_ascii_filter ) ):
+    """
+        just ignore all non-ascii characters 
+        since any identifiers should be ASCII anyway ;
+        nb: this will work for utf-8 as well
+        
+    """
+
+    result = _re_expr.sub( '', sourcecode )
+    return result
+

 class Mark(object):
     """ Marks, as defined by Cscope, that are implemented.
@@ -234,6 +247,7 @@
     # Add path info to any syntax errors in the source files
     if filecontents:
         try:
+            filecontents = ascii_dammit( filecontents )
             indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len, dump)
         except (SyntaxError, AssertionError) as e:
             e.filename = fullpath

thread13 avatar Jun 04 '16 11:06 thread13

pycscope also does not like embedded '\0'-s : ( btw, probably it shall add the filename to the printed exception )

Traceback (most recent call last):
  File "/usr/local/bin/pycscope", line 9, in <module>
    load_entry_point('pycscope==1.2.1', 'console_scripts', 'pycscope')()
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 128, in main
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 171, in work
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 237, in parseFile
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 938, in parseSource
TypeError: suite() argument 1 must be string without null bytes, not str

thread13 avatar Jun 04 '16 12:06 thread13

printing the filename of the file that brings us down ( commit 50e42f9 in the fork ):

$ diff -u __init__.py.new __init__.py 
--- __init__.py.orig    2016-06-04 22:31:10.027610098 +1000
+++ __init__.py 2016-06-04 22:53:03.805282697 +1000
@@ -247,11 +247,16 @@
     # Add path info to any syntax errors in the source files
     if filecontents:
         try:
             indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len, dump)
         except (SyntaxError, AssertionError) as e:
             e.filename = fullpath
             raise e
+        except Exception as e:
+            # debug a fatal exception: 
+            e.filename = fullpath
+            print("pycscope.py: %s in %s" % (e, repr(fullpath)))
+            raise e

     return indexbuff_len

thread13 avatar Jun 04 '16 13:06 thread13