Indentifying and removing null characters
Have you checked for existing feature requests?
- [x] Completed
Summary
Highlight null characters (\0) in the text.
What benefits does this feature provide?
Sometimes I have a file with null characters (\0). There mostly Windows .reg files which appear to have been corrupted transitioning between UTF16 and UTF8.
As far as I can tell, showing white space doesn’t include the null characters.
Is there a way of including an option to highlight these null characters?
Any alternatives?
At the moment, there’s no simple way to identify these characters. If I use find, set it to regex, and search for \0, then it will highlight them as zero-width selections. I can then finish the job and delete them. For this I use a little JavaScript to find and knock them out.
var nullCharacter = String.fromCharCode(0);
var cr = String.fromCharCode(13);
var editor=atom.workspace.getActiveTextEditor();
var buffer=editor.buffer;
buffer.setText(buffer.getText().replaceAll(nullCharacter, ''));
buffer.setText(buffer.getText().replaceAll(cr, ''));
Other examples:
No response
If they don't correspond to any sort of screen display — if there isn't space reserved for them on screen — then there isn't a realistic way to highlight them.
If the idea is to know when they're present, you can probably write some code that scans for them when you open a file and shows a notification if they're present.
const NULL_CHAR = String.fromCharCode(0);
const CR_CHAR = String.fromCharCode(13);
atom.workspace.observeTextEditors((editor) => {
let text = editor.getText();
if (text.includes(NULL_CHAR) || text.includes(CR_CHARACTER)) {
atom.notifications.addInfo(`Warning: invisible characters!`);
}
});
If it's just to ensure that they're stripped automatically, you could adapt your code above to be a pre-save callback and put it in your init.js:
const NULL_CHAR = String.fromCharCode(0);
const CR_CHAR = String.fromCharCode(13);
function fixOnSave (buffer) {
let text = buffer.getText();
text = text.replaceAll(NULL_CHAR, '').replaceAll(CR_CHAR, '');
buffer.setText(text);
}
atom.workspace.observeTextEditors((editor) => {
let buffer = editor.getBuffer();
buffer.onWillSave(fixOnSave);
});
Thanks for your comments. That looks workable.
My thought was that it might be highlighted like a cursor, possibly red.
To fake the problem, I ran the following on a normal document:
buffer.setText(buffer.getText().replaceAll(/(.)/g,`$1${nullCharacter}`));
I then highlighted the null between adjacent characters and used Select Next to highlight the rest:
https://github.com/user-attachments/assets/81dbb3f6-b4c6-4979-8a16-e498cf77ef39
Of course, in the attached video it’s flashing, which is not necessarily what we ant.
If, instead I do a Find All, I get something like this:
In either case the null character is highlighted without taking up extra space.
That's true — you could certainly highlight them if you were willing to extend the highlighting to an adjacent character on one or both sides of the \0.
If you wanted to have this be a baseline behavior across all kinds of files, you could set up a buffer.onDidStopChanging callback that searched for them with buffer.scan and set up DisplayMarkers for certain buffer ranges.
This area of the code has a little bit of a learning curve just because it involves a lot of bookkeeping (don't want to create new markers for null characters you found on the last search). It gets a lot easier if you assume that you won't accidentally introduce any such characters during an editing session — because in that case you can just do the scan once upon buffer load and then assume any edit that touches the marker on either side renders it invalid.
You'd want to create the markers in your init.js and call TextEditor::decorateMarker on each one so you can give it a class name. (Or create a DisplayMarkerLayer and decorate the whole layer.) Then you can style the markers in your user stylesheet.
Since this approach seems to satisfy you, I'm going to close this issue. Feel free to reopen it if you can describe a bug or enhancement around this problem that cannot be sufficiently addressed with the code in your init.js or with a community package.