sds icon indicating copy to clipboard operation
sds copied to clipboard

Enough awesome changes for a new version

Open easyaspi314 opened this issue 7 years ago • 0 comments

I have been messing with sds recently, and I decided to submit my changes.

This is a summary of the changes:

The biggest feature is the new sdsadd macro. By far the best thing about this blob:

sds s = sdsempty();
s = sdsadd(s, "Hello");
s = sdsadd(s, "world");
s = sdsadd(s, '!');
s = sdsadd(s, ' ');
s = sdsadd(s, 1337);
puts(s);

Output> Hello, world! 1337

This is just like the += operator you find in Java/JavaScript/Ruby/std::string/QString/etc. It doesn't work with all compilers, but it has three implementations which have decent coverage:

  • C11's _Generic
  • C++11's overloading and type_traits
  • GCC/Clang's __builtin_types_compatible_p and __builtin_check_expr extensions

Even if you can't use a compiler with at least one of those features, there are also the manual functions:

sds s = sdsempty();
s = sdscat(s, "Hello");
s = sdscat(s, "world");
s = sdsaddchar(s, '!');
s = sdsaddchar(s, ' ');
s = sdsaddint(s, 1337);
puts(s);

Output> Hello, world! 1337

Note that sdsadd will try its best to detect character literals, but it is impossible to catch them all without ridiculous compiler-dependent voodoo. The way to get sdsadd to recognize things is to make sure it is either explicitly the char type (either via a declaration or cast, signed/unsigned char are not guaranteed to work), or if the expression in the second macro argument lexically starts or ends in a single quote. The macro that detects the latter case is this:

#define SDS_IS_CHAR(x) ((#x[0] == '\'') || (sizeof(#x) > 3 && #x[sizeof(#x) - 2] == '\''))

Note: that check is optimized away at compile time, so don't worry about extra runtime checks.

As long as your statement matches that and is convertible to char, int, or unsigned int, it will match. It works about 90% of the time.

Other features:

  • ABI has been broken again, but that will probably not happen again because most things are fairly futureproof. The only major changes are that you will have to make sure all sds functions have their return value taken, regardless of whether they reallocate or not.
  • That is much easier to spot, because thanks to the macros SDS_MUT_FUNC, SDS_INIT_FUNC and friends, if you compile with GCC or Clang, they will warn you if you forget to take the return value. It also warns about fishy printf strings for sdscatprintf.
  • Full compatibility with a C++11 compiler, with some extra std::string conversion functions when you compile with it.
  • Removal of SDS_TYPE_5. Before you say, "oh now why did you remove that?", look at my reasoning:
    • It actually made performance slower. The two bytes you may have saved is offset by the extra time, code, and cache misses it takes to parse it.
    • It makes operations more complicated.
    • It requires us to reallocate whenever we increase the length.
    • It forces us to use three bits for the flags byte, which could be used for other things like actual flags.
    • It only gives us 32 characters before it is useless.
    • Its removal allows us to change this ugly mess:
    static inline size_t sdslen(const sds s) {
        unsigned char flags = s[-1];
        switch(flags&SDS_TYPE_MASK) {
            case SDS_TYPE_5:
                return SDS_TYPE_5_LEN(flags);
            case SDS_TYPE_8:
                return SDS_HDR(8,s)->len;
            case SDS_TYPE_16:
                return SDS_HDR(16,s)->len;
            case SDS_TYPE_32:
                return SDS_HDR(32,s)->len;
            case SDS_TYPE_64:
                return SDS_HDR(64,s)->len;
        }
        return 0;
    }

to this clean syntax:

    static inline size_t sdslen(const sds s) {
        SDS_HDR_LAMBDA(s, { return sh->len; });
        return 0;
    }
  • The SDS_HDR_LAMBDA and SDS_HDR_LAMBDA_2 macros run the block of code you give it automatically. It gives you the following:
    {
        sds s;
        typedef struct sdshdr[8|16|32|64] sdshdr;
        sdshdr *sh;
        typedef uint[8|16|32|64]_t sdshdr_uint;
        typedef int[8|16|32|64]_t sdshdr_int;
    }

SDS_HDR_LAMBDA takes an sds string, followed by the block. SDS_HDR_LAMBDA_2 takes an sds string, followed by the flags byte, then the block. It is safe to pass NULL as the first argument, in which sh will also be NULL Note that the macro does wrap a switch statement, so don't use break, and try to do as many things as possible because calling it is rather expensive.

  • Performance improvements
    • This is most noticable on 32-bit, especially ARMv7 which gets up to a 10x speedup simply by using int instead of long long whenever possible. (from a benchmark of sdsll2str, renamed to sdslonglong2str on an LG G3).
    • restrict pointers prevents wasting time with aliasing
    • Using __builtin_expect macros to make the CPU not waste its time on unlikely checks.
    • sdscatprintf is faster, instead of guessing what allocation size it has to use, it uses the return value of vsnprintf to directly retrieve the size it needs.
    • More specific addition functions prevent expensive calls to sdscatprintf.
    • Some other annotations allow the compiler to optimize the code.
    • SDS_HDR_LAMBDA makes it faster (or slower, depending on how many times you use it) to do multiple changes to the string's header at once.
  • sdssetlen, sdssetalloc, etc are now safer. They will reallocate when necessary.
  • Code is more portable, fixing some issues with MSVC, and removing the flags and data documentation members from the sdshdrs makes it so we don't need __attribute__((__packed__)) or VLA support. I also tried to make things (mostly) happy with C90.
  • Some duplicated code blocks are now expanded macros.
  • Tests are now in their own file, and there are now up to 75 different tests, dependending on what compiler you use.
  • Some minor bugfixes
  • Defining SDS_ABORT_ON_ERROR will make sds abort on an error (with a message to stderr giving the line) instead of returning NULL.
  • Some other features, such as sdssplit which just runs sdssplitlen with strlen, and support for int types instead of just long long.

Documentation is coming soon, as well as more pedantic error checking (a debug mode?)

easyaspi314 avatar Oct 24 '18 02:10 easyaspi314