uslug
uslug copied to clipboard
Condense file sizes
I have a javascript application which slugifies titles on the client side. I wanted to use this library, but found that when it was uglied, it was rather large.
I've made my own smaller library, but am putting up this Pull Request, in case you want to incorporate the changes into your own library.
The new uglified L file is 4.9K, as opposed to the previous one, which was 102K.
I want to note two things:
-
This uses the XML unicode database, which is all inclusive. The text format is separated into different files, which is why Hangul has special exceptions in the code. You might be able to remove the hangul and CJK conditions now that this uses the all inclusive database.
-
I removed all characters outside of the BMP. I found that
uslugdoes not support characters in the supplementary planes. As a lot of javascript engines (all as far as I know), treat surrogate pairs as two separate characters.
I would like to add support for this, but it would require a more complicated script to generate the regexp. If interested, let me know.
Thanks for the PR! I'm not using uslug much now so I don't have strong opinions, your PR looks good to me. I'll probably just approve your changes once I have some free time. Some questions below:
- Could you update the README to mention instructions about updating unicode lists? And the 2 notes you posted in your PR?
- Is there some test cases we could add to further verify this or is the current set of tests enough?
- Finally, would you mind providing me with a short release note that I can use when pushing this new version on NPM?
Thanks again!
Hey @jeremys,
I moved the comments out of the lib files and put them into the README. Maybe you can take a look and see if what I've done makes sense.
I only added one test and that was to verify that a character that is marked as a letter in the supplementary plane (and in the original array) is, in fact, removed.
I think the current tests are adequate. Although, if we want to let someone know that the number of valid characters are changing, we could add a test which iterates through each character and verifies the number of valid characters is the same. For example, I could add something like this:
var EXPECTED_VALID_CHARACTERS = 49995;
var validCharacterCount = 0;
for (var charCode = 0; charCode < 0xFFFF; charCode++) {
if (uslug(String.fromCharCode(charCode)).length === 1) {
validCharacterCount++;
}
}
validCharacterCount.should.equal(EXPECTED_VALID_CHARACTERS);
Let me know if you want this and I can put it in. The current set does support more characters, as the XML is more inclusive.
Hrm, not sure about the release note. The important piece of information to the end user seems to be that the size of the library has decreased. Maybe something simple, along the lines of:
Reduce size of library.