CTags icon indicating copy to clipboard operation
CTags copied to clipboard

Large tag files cause error when sorting

Open rednecknguyen opened this issue 11 years ago • 11 comments

I'm on Mac OS X Lion. I've changed out /usr/bin/ctags with a proper ctags implementation. I'm new to Sublime Text and to CTags. I've never gotten CTags to work properly yet.

I'd love to get this working. Please help.

When attempting to build ctags, I'm getting the following error on every project I have (regardless of source tree - username removed):

Exception in thread Thread-5:
Traceback (most recent call last):
  File "X/threading.py", line 639, in _bootstrap_inner
  File "X/threading.py", line 596, in run
  File "ctagsplugin in /Users/[user]/Library/Application Support/Sublime Text 3/Installed Packages/CTags.sublime-package", line 103, in run
  File "ctagsplugin in /Users/[user]/Library/Application Support/Sublime Text 3/Installed Packages/CTags.sublime-package", line 682, in build_ctags
  File "ctags in /Users/[user]/Library/Application Support/Sublime Text 3/Installed Packages/CTags.sublime-package", line 178, in build_ctags
  File "ctags in /Users/[user]/Library/Application Support/Sublime Text 3/Installed Packages/CTags.sublime-package", line 157, in resort_ctags
  File "X/encodings/ascii.py", line 26, in decode
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 1003: ordinal not in range(128)

rednecknguyen avatar Aug 17 '13 21:08 rednecknguyen

Just FYI, this is what I get when I attempt things on Linux:

Re/Building CTags for /home/[user]/development/comm2_boost_testing/src/Common/.tags: Please be patient
Traceback (most recent call last):
  File "/home/[user]/Downloads/sublime_text_3/sublime_plugin.py", line 445, in is_enabled_
    raise ValueError("is_enabled must return a bool", self)
ValueError: ('is_enabled must return a bool', <CTags.ctagsplugin.NavigateToDefinition object at 0x7f1ff86d0c10>)
Traceback (most recent call last):
  File "/home/[user]/Downloads/sublime_text_3/sublime_plugin.py", line 445, in is_enabled_
    raise ValueError("is_enabled must return a bool", self)
ValueError: ('is_enabled must return a bool', <CTags.ctagsplugin.NavigateToDefinition object at 0x7f1ff86d0c10>)
Traceback (most recent call last):
  File "/home/[user]/Downloads/sublime_text_3/sublime_plugin.py", line 445, in is_enabled_
    raise ValueError("is_enabled must return a bool", self)
ValueError: ('is_enabled must return a bool', <CTags.ctagsplugin.ShowSymbols object at 0x7f1ff86d0c50>)
Traceback (most recent call last):
  File "/home/[user]/Downloads/sublime_text_3/sublime_plugin.py", line 445, in is_enabled_
    raise ValueError("is_enabled must return a bool", self)
ValueError: ('is_enabled must return a bool', <CTags.ctagsplugin.ShowSymbols object at 0x7f1ff86d0c50>)
Traceback (most recent call last):
  File "/home/[user]/Downloads/sublime_text_3/sublime_plugin.py", line 445, in is_enabled_
    raise ValueError("is_enabled must return a bool", self)
ValueError: ('is_enabled must return a bool', <CTags.ctagsplugin.ShowSymbols object at 0x7f1ff86d0c50>)
Traceback (most recent call last):
  File "/home/[user]/Downloads/sublime_text_3/sublime_plugin.py", line 445, in is_enabled_
    raise ValueError("is_enabled must return a bool", self)
ValueError: ('is_enabled must return a bool', <CTags.ctagsplugin.ShowSymbols object at 0x7f1ff86d0c50>)
Exception in thread Thread-3:
Traceback (most recent call last):
  File "X/threading.py", line 639, in _bootstrap_inner
  File "X/threading.py", line 596, in run
  File "ctagsplugin in /home/[user]/.config/sublime-text-3/Installed Packages/CTags.sublime-package", line 103, in run
  File "ctagsplugin in /home/[user]/.config/sublime-text-3/Installed Packages/CTags.sublime-package", line 682, in build_ctags
  File "ctags in /home/[user]/.config/sublime-text-3/Installed Packages/CTags.sublime-package", line 178, in build_ctags
  File "ctags in /home/[user]/.config/sublime-text-3/Installed Packages/CTags.sublime-package", line 157, in resort_ctags
  File "X/codecs.py", line 300, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 6202: invalid start byte

rednecknguyen avatar Aug 17 '13 21:08 rednecknguyen

Well, after some investigating, it appears there is a memory issue or a buffer getting overloaded when the .tags file is large. In my case, for my source trees, the .tags file are greater than 25MB.

When the resort_tags function runs against this very large file, it runs into the error. There are no issues with the files except that they are large.

If instead of calling the resort tags function, I create a subprocess that runs a new file resort_tags.py file (which runs the resort_tags function), everything is fine.

Note: I manually installed CTags into the Packages folder. The folder is named CTags-Master as that's what was in the zip file I downloaded from this site.

As a quick and dirty workaround, in ctags.py in build_ctags(), instead of the call to resort_tags, I did the following.

_WARNING: I AM NOT A PYTHON PROGRAMMER IN ANY SENSE. I'M A C/C++ PROGRAMMER. REWRITE TO PYTHON STANDARD PROGRAMMING PRACTICES._

resort_path = sublime.packages_path() + '/CTags-master/' + 'resort_tags.py'
resort_path = '"%s"' % resort_path

cmd2 = 'python ' + resort_path +  ' ./.tags'

p2 = subprocess.Popen(cmd2, cwd = dirname(tag_file), shell=1, env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

ret2 = p2.wait()
if ret2: raise EnvironmentError((cmd2, ret2, p2.stdout.read()))

I don't know if this is a python issue where the interpreter gets a hick-up or what. But this is regularly occurring with varying sizes of .tags files that are large, even when the source trees have unrelated code.

rednecknguyen avatar Aug 24 '13 04:08 rednecknguyen

+1 I have the same problem in Mac OS X Mountain Lion

astyagun avatar Oct 22 '13 00:10 astyagun

@rednecknguyen @astyagun I'm looking into this issue. Would either of ye happen to have a sample .tags file that I could test again?

stephenfin avatar Oct 30 '13 21:10 stephenfin

http://yadi.sk/d/wmFrx6U0BsZoR

You can also recreate it by generating tags for Rails framework for example.

  • Clone Rails framework from Github repo (https://github.com/rails/rails)
  • Run bundle install --path bundle to fetch all dependencies into subdirectory
  • Generate tags in Sublime Text

astyagun avatar Oct 30 '13 21:10 astyagun

So I've looked into this. Problem seems to be because the sorting is taking place in memory - the built in Python interpreter in ST must have some enforced memory limit that this hits (hence why the solution @rednecknguyen proposed works - it spawns a new Python process outside of ST).

@rednecknguyen's solution (while good) isn't perfect though - it assumes that Python is installed in the system and basically ups the memory ceiling - the same issue could occur with a larger file again. I propose two possible solutions:

  1. Offload to sort in unix.
    • Pros: This would be far faster than anything we could achieve in Python.
    • Cons: While Windows does provide a sort utility it's very basic and won't let you sort on tabbed columns (as found in tag files). Hence we'd need to provide an alternative here.
  2. Reimplement the sort algorithm to use external sorts for large files
    • Pros: Would work without any external requirements - it's pure Python after all.
    • Cons: External sorts are slow. Even if you used a hybrid "sometimes-internal-sometimes-external" solution, deciding when to use an external vs. internal sort would be tricky.

Opinions anyone?

stephenfin avatar Nov 30 '13 22:11 stephenfin

What about reimplementing sort if running on windows?

http://stackoverflow.com/questions/1325581/how-do-i-check-if-im-running-on-windows-in-python

davividal avatar Dec 09 '13 19:12 davividal

Yeah - I considered that alright (point 1). However, we'll still have the same issue (albeit only on Windows). It's a kind of half-way solution that will only fix things for some people and add to the maintenance overhead. Good idea though :)

stephenfin avatar Dec 09 '13 22:12 stephenfin

@astyagun @rednecknguyen I've pushed some changes to a feature branch. Would either of ye mind checking out that branch and seeing if it fixes things? Ye can see the changes made in the commits, but essentially there are now three ways to sort files:

  • classic in-memory sort (0)
  • python-based external bucket sort (1)
  • GNU sort (per @davividal's suggestion) (2)

These can be configured by setting the value of sort in settings, i.e. to enable the bucket sort:

{
  "sort": 1
}

stephenfin avatar Mar 08 '14 01:03 stephenfin

I can't reproduce the problem anymore. Either it's fixed in the version from Package Control already or some change in my setup has fixed it.

astyagun avatar Mar 08 '14 08:03 astyagun

Well that's good to hear. Hopefully it's the former. I'll wait to see if any other reports of the issue arise and if not can I guess we consider this issue closed

stephenfin avatar Mar 08 '14 14:03 stephenfin