leptonica icon indicating copy to clipboard operation
leptonica copied to clipboard

Minor portability issues

Open iskunk opened this issue 6 years ago • 14 comments

I am building Leptonica on a variety of (POSIX) platforms, and have found a few minor issues hindering its portability.

First, I'll present a few straightforward fixes.

  1. Some compilers cannot abide the return keyword in a void function, even if returning a void value:
diff --git a/src/map.c b/src/map.c
index 71b3923..85c5400 100644
--- a/src/map.c
+++ b/src/map.c
@@ -129,7 +129,7 @@ l_amapInsert(L_AMAP  *m,
              RB_TYPE  key,
              RB_TYPE  value)
 {
-    return l_rbtreeInsert(m, key, value);
+    l_rbtreeInsert(m, key, value);
 }
 
 void
@@ -213,7 +213,7 @@ l_asetInsert(L_ASET  *s,
 RB_TYPE  value;
 
     value.itype = 0;  /* meaningless */
-    return l_rbtreeInsert(s, key, value);
+    l_rbtreeInsert(s, key, value);
 }
 
 void
  1. Solaris has fstatat(), but not dirfd(). Because the latter is needed in order to use the former, I would suggest conditionalizing the code in a manner where both or neither are used:
diff --git a/src/sarray1.c b/src/sarray1.c
index 7716071..ab415a8 100644
--- a/src/sarray1.c
+++ b/src/sarray1.c
@@ -1887,9 +1887,11 @@ struct stat     st;
         return (SARRAY *)ERROR_PTR("pdir not opened", procName, NULL);
     }
     safiles = sarrayCreate(0);
+#if HAVE_DIRFD && HAVE_FSTATAT
     dfd = dirfd(pdir);
+#endif
     while ((pdirentry = readdir(pdir))) {
-#if HAVE_FSTATAT
+#if HAVE_DIRFD && HAVE_FSTATAT
         stat_ret = fstatat(dfd, pdirentry->d_name, &st, 0);
 #else
         size = strlen(realdir) + strlen(pdirentry->d_name) + 2;

(Note: This requires a check for dirfd() in the configure script, and a new clause in src/environ.h. However, there is a separate issue related to this, which I will discuss further below.)

  1. The off64_t type is made available when you define _LARGEFILE64_SOURCE, not _LARGEFILE_SOURCE. See this reference for information. The usage in Leptonica can be fixed as follows:
diff --git a/src/tiffio.c b/src/tiffio.c
index 5a761ff..5401485 100644
--- a/src/tiffio.c
+++ b/src/tiffio.c
@@ -257,7 +257,7 @@ lept_seek_proc(thandle_t  cookie,
     _fseeki64(fp, pos, SEEK_SET);
     if (pos == _ftelli64(fp))
         return (tsize_t)pos;
-#elif defined(_LARGEFILE_SOURCE)
+#elif defined(_LARGEFILE64_SOURCE)
     off64_t pos = 0;
     if (!cookie || !fp)
         return (tsize_t)-1;
@@ -324,7 +324,7 @@ lept_size_proc(thandle_t  cookie)
     _fseeki64(fp, 0, SEEK_END);
     size = _ftelli64(fp);
     _fseeki64(fp, pos, SEEK_SET);
-#elif defined(_LARGEFILE_SOURCE)
+#elif defined(_LARGEFILE64_SOURCE)
     off64_t pos;
     off64_t size;
     if (!fp)

Those are the low-hanging fruit.

One more subtle issue is the library's use of configuration preprocessor symbols, like HAVE_FMEMOPEN. In src/environ.h, I see this:

/*-------------------------------------------------------------------------*
 * On linux systems, you can do I/O between Pix and memory.  Specifically,
 * you can compress (write compressed data to memory from a Pix) and
 * uncompress (read from compressed data in memory to a Pix).
 * For jpeg, png, jp2k, gif, pnm and bmp, these use the non-posix GNU
 * functions fmemopen() and open_memstream().  These functions are not
 * available on other systems.
 * To use these functions in linux, you must define HAVE_FMEMOPEN to 1.
 * To use them on MacOS, which does not support these functions, set it to 0.
 *-------------------------------------------------------------------------*/
#if !defined(HAVE_CONFIG_H) && !defined(ANDROID_BUILD) && !defined(OS_IOS) && \
    !defined(_WIN32)
#define  HAVE_FMEMOPEN    1
#endif  /* ! HAVE_CONFIG_H etc. */

And then, this is used in the code as

$ grep -R HAVE_FMEMOPEN src
src/dewarp1.c:#if HAVE_FMEMOPEN
src/dewarp1.c:#endif  /* HAVE_FMEMOPEN */
src/dewarp1.c:#if HAVE_FMEMOPEN
[...]

There is an inconsistency here:

  1. The comment in environ.h states that HAVE_FMEMOPEN should be set to either 1 (use the function) or 0 (don't use it).
  2. The preprocessor code in environ.h either sets HAVE_FMEMOPEN to 1, or leaves it undefined.
  3. The library code expects HAVE_FMEMOPEN to be set to 0 or 1 (as #if expects a boolean expression). It is technically an error if HAVE_FMEMOPEN is undefined, but the compiler will usually assume a value of 0 in such a case. (GCC will warn about this with -Wundef.)
  4. When you use AC_CHECK_FUNC() (or AC_CHECK_FUNCS()) in Autoconf, it follows the 1-or-undefined convention, not 1-or-0.

Leptonica is mixing up two different conventions for configuration preprocessor symbols, when as a single project it should be using just one.

I would advise favoring 1-or-undefined, to stay consistent with Autoconf, and also because (thanks to Autoconf) it tends to be more common and expected. This would require updating environ.h, and changing various #if HAVE_BLAH directives into #ifdef HAVE_BLAH or #if defined(HAVE_BLAH) as appropriate.

iskunk avatar Oct 28 '19 22:10 iskunk

Thank you for bringing these issues up in such a clear way!

(1) and (3) are easy to fix (you've provided it). (2) requires changes in autoconf, which I can direct to the autoconf maintainer. I wonder: how big is the user community for Oracle Solaris or illumos? The "subtle" issue is indeed subtle -- I've never worried about the difference between 0 and undefined. Is anything currently broken?

Thank you again for being so helpful with this.

-- Dan

DanBloomberg avatar Oct 29 '19 22:10 DanBloomberg

Note that (2) would require adding a check for dirfd() to the project's configure.ac script, not an change to Autoconf itself. This would amount to adding a line like

AC_CHECK_FUNC([dirfd])

The user base for Solaris/Illumos may not be as big as it used to be, but my org still services customers on that platform. Bear in mind, too, that a tweak like this may well benefit users on some other platform that similarly lacks dirfd(). Solaris represents only an example; it not necessarily the sole beneficiary of such a change.

"Broken" is relative. If you compile with -Werror=undef (and disabled libraries), for example, then you'll see quite a bit of breakage. If you're only concerned with whether it compiles or not with GCC using default flags, then this is a non-issue. The question is, do you want to rely on GCC's willingness to accept not-entirely-correct code, or would you rather get it right?

Note that config-symbol confusion can, over time, lead to trouble like

#define HAVE_BLARG 0    /* we don't have blarg */

[...]

#ifdef HAVE_BLARG
  init_blarg();
#endif

[...]

error: undefined reference to 'init_blarg'

In my experience, it's less trouble to standardize on 1-or-undefined (or defined-or-undefined), than to keep track of which symbols need to be checked with #ifdef versus #if.

iskunk avatar Oct 29 '19 23:10 iskunk

Makes sense. I'll try to look things over in the coming week, and will undoubtedly have more questions for you.

On Tue, Oct 29, 2019 at 4:50 PM Daniel Richard G. [email protected] wrote:

Note that (2) would require adding a check for dirfd() to the project's configure.ac script, not an change to Autoconf itself. This would amount to adding a line like

AC_CHECK_FUNC([dirfd])

The user base for Solaris/Illumos may not be as big as it used to be, but my org still services customers on that platform. Bear in mind, too, that a tweak like this may well benefit users on some other platform that similarly lacks dirfd(). Solaris represents only an example; it not necessarily the sole beneficiary of such a change.

"Broken" is relative. If you compile with -Werror=undef (and disabled libraries), for example, then you'll see quite a bit of breakage. If you're only concerned with whether it compiles or not with GCC using default flags, then this is a non-issue. The question is, do you want to rely on GCC's willingness to accept not-entirely-correct code, or would you rather get it right?

Note that config-symbol confusion can, over time, lead to trouble like

#define HAVE_BLARG 0 /* we don't have blarg */

[...]

#ifdef HAVE_BLARG init_blarg(); #endif

[...]

error: undefined reference to 'init_blarg'

In my experience, it's less trouble to standardize on 1-or-undefined (or defined-or-undefined), than to keep track of which symbols need to be checked with #ifdef versus #if.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/issues/455?email_source=notifications&email_token=AD7KMLHJAKAFCJP43IGYES3QRDD5DA5CNFSM4JGA2IUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSOTNA#issuecomment-547678644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7KMLG4NKKHJMOC2UJ7NIDQRDD5DANCNFSM4JGA2IUA .

DanBloomberg avatar Oct 30 '19 00:10 DanBloomberg

I'll be happy to answer any questions that arise. I don't see that a lot of code work should be needed here, in any event; it is more about clarifying how the library handles these constructs.

iskunk avatar Oct 30 '19 17:10 iskunk

(1) and (3) are will be pushed. (2) requires assistance, so I'm deferring.

Looking further at the high-hanging fruit:

I appreciate standardization and consistency, up to a point. To get the library defines in environ.h to work in the same way as auto_config.h (1 or undefined), for missing libraries I'd have to comment out each one, including the guards preventing re-definition, rather than simply setting the value to 0. Doesn't seem worth the mess.

As for the DEBUG* defines in files, again, it's neater and a lot easier to set the value to 0 or 1 than to add or remove /*.... */ around it.

Following the advice not to fix things that aren't really broken. Another way to look at it is that right now the system is forgiving, with undefined variables implicitly getting the value 0. Therefore, using #if instead of #ifdef works on everything. But if we started using #ifdef without purging all the variables defined to 0, we get errors (as in your example).

DanBloomberg avatar Dec 04 '19 23:12 DanBloomberg

If you really want to stick with 1-or-0, I would suggest defining a new set of variables that use this convention, and then use the Autoconf ones to initialize them. For example...

#ifndef LEPTONICA_HAVE_JPEG
# ifdef HAVE_LIBJPEG
#   define LEPTONICA_HAVE_JPEG 1
# else
#   define LEPTONICA_HAVE_JPEG 0
# endif
#endif

...

#if LEPTONICA_HAVE_JPEG
/* jpeg stuff */
#endif

This way, you have a clear demarcation between the Autoconf convention, and Leptonica's.

That said, I'm not seeing an upside to avoiding the Autoconf convention. I appreciate the point that adding /* ... */ takes a bit more typing, but literally thousands of other libraries already take that approach, not least thanks to Autoconf tipping the scales that way. And are we really likely to miss any 1-or-0 variables in the conversion?

(Also, the undefined-becomes-zero behavior is technically incorrect, allowed by default by the compiler only because it comes up so often in legacy codebases. It's like saying, hey, we can leave off the int keyword from declarations, because the compiler assumes that if the type is unspecified.)

One possible simplification you may want to consider is getting rid of the config_auto.h vs. src/environ.h duopoly altogether. Instead, have Autoconf write out config.h per usual, and ship a config.manual.h file, looking mostly like environ.h. A user who wants to build Leptonica sans Autoconf would copy config.manual.h to config.h and edit it as desired. This would eliminate the current situation of having the same definitions potentially come from two different places, and all the associated conditionalization. (The PCRE project is one I know of that uses this approach.)

iskunk avatar Dec 05 '19 19:12 iskunk

Thank you for all the suggestions!!

Looking it over, it seems the best option is your original one, adopting 1-or-undefined for variables that are set by auto_config.h (if any changes are to be made at all).

Have some other changes to the library to finish with (including putting auto_config.h in each source file); then will do some experiments.

DanBloomberg avatar Dec 05 '19 19:12 DanBloomberg

Will be happy to test the changes once they're in, and help finish out #2 afterward.

iskunk avatar Dec 05 '19 20:12 iskunk

By the way, one minor issue I only noticed recently: The configure.ac script is using AX_SPLIT_VERSION, yet there is no definition of this macro in the tree. This results in the following error message at configure time, toward the end:

checking for fstatat... yes
../leptonica/configure: line 15263: AX_SPLIT_VERSION: command not found
checking whether ln -s works... yes
checking that generated files are newer than configure... done
configure: creating ./config.status

Could you download ax_split_version.m4 from here, and place it in the m4 subdirectory?

iskunk avatar Jan 09 '20 21:01 iskunk

When I build on a local linux box, I don't get that error message.  Instead:

​checking Major version... 1 checking Minor version... 79 checking Point version... 1

Where are you building to get this error? I see that tesseract has an m4 directory with two .m4 files, one of them being this ax_split_version.

​Happy to put it in, just would like some more background.

  -- Dan

DanBloomberg avatar Jan 09 '20 23:01 DanBloomberg

It's likely that you have an "autoconf-archive" package installed that provides that file in a system directory. (For example, the Debian package has it at /usr/share/aclocal/ax_split_version.m4)

I'm using a local build of Autoconf that has only the bundled macros. Autoconf-Archive macros have always been external to Autoconf proper, so it would be good to have that in the tree.

iskunk avatar Jan 10 '20 00:01 iskunk

done

DanBloomberg avatar Jan 10 '20 00:01 DanBloomberg

The drifd issue exists on AIX also. I patched sarray1.c like so and was able to compile. Just FYI

#if HAVE_FSTATAT dfd = dirfd(pdir); #endif

mgtremaine avatar Nov 09 '21 14:11 mgtremaine

Thank you for the report.

In accordance with issue (2) above, I have defined (i.e., set a value for) HAVE_DIRFD and am using that in conjunction with HAVE_FSTATAT. This also fixes your AIX issue. Checks for the function dirfd() are made in both configure.ac and sw.cpp.

DanBloomberg avatar Nov 09 '21 18:11 DanBloomberg