shapelib
shapelib copied to clipboard
Cleanup / modernization - general code audit
I'm taking a look at what it might look like to cleanup shapelib and move it to C99 or newer. I know that opinions vary as to if these are good changes or not, so I'm doing this in a separate repo without worrying out making PRs:
https://github.com/schwehr/shapelib-experimental
I used debian testing with -Wall -Wextra
and cppcheck to keep an eye on issues.
The changes are not always fully implemented. The goal is evaluate what the library might look like. There is not currently enough testing in shapelib to be sure that nothing has broken. I'm hoping to find some time to add more test coverage. I did put the core files into my local copy of GDAL and my shape tests pass.
So far, the changes I've made are roughly:
- clang-format
- Remove unused variables
- Add missing include of stdio.h (beginnings of IWYU)
- Order includes
- Remove comments that are no longer relevant
- Remove cvs log messages from file comments
- Reduce bulky block comments
- Localize variables
- Combine definition and initialization
- Add const when possible
- int boolean → bool (using stdbool.h)
- Remove dbmalloc
- Remove C++ compilation of the C source (removed C++ casting macros)
- Mark a few possible bugs with TODO
- Reorder functions to not need local prototypes
- Fix some error exit's to properly close open resources
This has already reduced the number of lines of source files:
# Before
wc *.[ch] contrib/*.[ch] | tail -1
15995 58264 553965 total
# Currently
wc *.[ch] contrib/*.[ch] | tail -1
13566 48865 469071 total
There are a lot more things I'd like to try out including:
- Add C test coverage (maybe with catch2? or I could default to gtest)
- Add command line tests using python3 to exercise the resulting programs
- Fuzzing directly on shapelib (both to find bugs and to make it easy to find files for testing)
- Use stdint.h types
- Use C99 printf codes
- Remove DEBUG code
- convert int foo; if (a) foo=b; else foo=c; → const int foo = a ? b : c;
- Running tests in asan and msan modes
- Looking at coverity
- etc.
Some of the issues I've found so far:
- A variety of license issues. e.g. files that state public domain and contain a copyright
- I'm not sure if some of the contrib files work
- booleans that are true, false, and sometimes -1
- Missing conditions in if chains / switch statements. e.g. dbfdump.c
- Variables with incorrect Hungarian prefixes. e.g. bMinX for int and iunselect for bool
- Multiple endian check and a few other redundancies
- Unchecked malloc and realloc calls that may fail
- Variables that don't change causing code to be unreachable. e.g. byRing
- shpdxf.c has a huge number of warnings including
may be used uninitialized
- CMake doesn't run the tests
- shputils.c has a large number of globals and is very hard to follow
- man pages would be nice
- functions should have const on pointer args they do not alter
This work was influenced by conversations around libtiff. e.g.
- https://gitlab.com/libtiff/libtiff/-/merge_requests/204 - tiffsplit.c: Experiment
- https://gitlab.com/libtiff/libtiff/-/merge_requests/217 - C99 snprintf
- https://gitlab.com/libtiff/libtiff/-/merge_requests/216 - C99 strtol, strtoul, strtoll and strtoull
- https://gitlab.com/libtiff/libtiff/-/merge_requests/215 - C99 Inline
- https://gitlab.com/libtiff/libtiff/-/merge_requests/211 - C99 format strings
- https://gitlab.com/libtiff/libtiff/-/merge_requests/205 - C99 integer types part 2
- https://gitlab.com/libtiff/libtiff/-/merge_requests/185 - C99 integer types part 1
There was clearly an attempt to make shapelib compile well as C++. It certainly would be pretty easy to switch the internals of shapelib to C++ without altering the external API keeping C linkage style for all exposed functions. I would certainly be fine with that. It would a bit of extra type safety and remove noise in the code. e.g.
- Remove the need for static on local only functions by using an anonymous namespace for all helper functions.
- Allow all casts to be converted to static_cast or reinterpret cast
- use constexpr for many things that are currently internal #defines
- numeric_limits will be cleaner
I would rate this as low priority
If I run pre-commit run --all-files
it will re-format all files massively (e.g., by changing the indentation level from 4 to 2), see https://github.com/OSGeo/shapelib/compare/master...thbeu:shapelib:format?expand=1
Is a .clang-format file missing in the repo?
There has never been a a .clang-format and pre-commit has not been setup for shapelib. When @rouault did https://github.com/OSGeo/shapelib/commit/fe315bace44a770f239cfea6e9d7b9a7b319772b, I'd guess that he used https://github.com/OSGeo/gdal/blob/master/.clang-format
pre-commit has not been setup for shapelib
CONTRIBUTING.md mentions the pre-commit setup for shapelib.
When @rouault did fe315ba, I'd guess that he used https://github.com/OSGeo/gdal/blob/master/.clang-format
Thanks for the hint to the missing .clang-format config. It's added by #65.