fix(codeCompletion): normalize indentation handling for lines with tabs
#3361
Thank you for the fix involving tabs. I have previously patched the bundles on my installs to get around the problem, but I suspect this has been affecting many others.
However if my understanding is correct that context.currentLinePrefix is the existing value of the line the cursor is on, I still see some possible problems.
First, if starting from zero indentation the check would not succeed and possibly replace tabs with spaces like it did before. This situation can commonly occur when creating new functions or top level variables. My suggestion would be to also check the beginning of lines in the completion before treating them as spaces.
Secondly, if more than 1 tab is recommended it appears that only a single tab will be removed likely resulting in the completion being off by one tab. Personally, I have had great success with my model of choice, qwen2.5-coder:14b, choosing the correct tab indentation. In many cases the completion's preceding indent is correct although that could be different with smaller models. Adding the additional tab manually requires canceling the completion because tab will accept the completion. Also, the rest of the completion remains unaltered so it will be inconsistent with the first line.
I have not tried on very small models but I suspect that tabs should be easier for models to do reliably, because there is never an in-between, they are distinct characters, and there are less of them. Doing no modification to tabs might be an OK solution.
I do also worry that this makes working on a 3 spaced or other oddly indented things problematic. Although odd indentations somewhat rare they do exists. For example bzip2 is 3 space indented.
Hi @sboys3, thank you for your valuable input on this issue. I now realize I overlooked some aspects of tab definition. Through testing, I've found that tabs indeed perform better than spaces in code completion tasks. The introduction of the postprocess method was primarily motivated by the limitations of smaller models (like 3B) in code completion capabilities. To verify this issue, I tested multiple models ranging from 1B to 14B and confirmed the tab-related problems before implementing fixes.
Regarding your second point, I completely agree with your observation. After testing, I will remove the code that deletes the first tab character. I also agree that no modifications should be made when currentPrefixLine is 0.
As for the third point, the current postprocess method is indeed a trade-off for smaller models. We are exploring better ways to handle all indentation cases. For now, adding certain constraints helps smaller models achieve performance more comparable to their more powerful counterparts. My current approach is to fix the existing tab indent issues while maintaining the original version with spaces.
Additionally, I'm considering whether we can completely remove the postprocessing step. I plan to evaluate metrics to assess the actual value of postprocessing. If it proves to have limited value, I may attempt to remove this processing step entirely. Thank you for your help!
This looks good now. Tabby should be back to tabbing correctly when this is released.