leptonica icon indicating copy to clipboard operation
leptonica copied to clipboard

Idea: de-anonymize enums and use the (new) names

Open pullmoll opened this issue 9 years ago • 14 comments

The current source uses anonymous enums everywhere. All parameters to functions expecting such enum values currently have l_int32 for the type of the respective parameter.

If, for example, the following enums were named

/*! Rotate flags */
enum L_ROTATE_FLAG {
    L_ROTATE_AREA_MAP = 1,     /*!< use area map rotation, if possible     */
    L_ROTATE_SHEAR = 2,        /*!< use shear rotation                     */
    L_ROTATE_SAMPLING = 3      /*!< use sampling                           */
};

/*! Background flags */
enum L_INCOLOR {
    L_BRING_IN_WHITE = 1,      /*!< bring in white pixels from the outside */
    L_BRING_IN_BLACK = 2       /*!< bring in black pixels from the outside */
};

the declaration of the function pixRotate() in src/rotate.c could look like this:

PIX *
pixRotate(PIX           *pixs,
          l_float32      angle,
          L_ROTATE_FLAG  type,
          L_INCOLOR      incolor,
          l_int32        width,
          l_int32        height)
{

and the generated documentation would much better describe and link to the possible values for type and incolor.

I don't know if and how much trouble such a change may cause for Leptonica users who currently use C++ and have bare numeric values in function calls for these parameters. For plain C that's not a problem, while C++ would insist on the enum values being used. That's why it may be a Bad Idea (tm) to do something like this?

If there is nothing preventing the name-giving for every enum, I would try and create a pull request which adds names to enums and uses them in function declarations where appropriate.

The above example's output can be seen here

pullmoll avatar Apr 17 '16 19:04 pullmoll

It's an interesting idea, and would be good to do if we didn't break much.

Let's hold off on it for a while. I can check if anyone has used bare integers in C++ at goog.

DanBloomberg avatar Apr 17 '16 20:04 DanBloomberg

Jurgen, I did some experimentation with the single most ubiquitous flags in the library: the copy flags. These flags are used with all data structures that have an insert or a copy function. However, because of legacy promises, I can't put L_NOCOPY and L_INSERT into the same enum, because their legacy values are both 0. Ugh.

As you said, with C++ (and some flags -- I don't know which), if the enum is named:

  • callers must use the symbol name, not the integer
  • functions that take these as inputs will be expected to declare them as enum L_COPY_FLAG ...

There are perhaps a hundred such functions.

So I think for now, given that if the camel gets its nose under the tent we'll get committed to do this for all enums (!), let's not.

Nevertheless, I will clean up those 'copy' flags, both in documentation and use, in the library.

DanBloomberg avatar Jun 24 '16 18:06 DanBloomberg

Dan, you can put L_NOCOPY and L_INSERT in the same enum, because you can simply define their values to be equal!

enum L_COPY_FLAGS {
    L_INSERT,
    L_NOCOPY = L_INSERT,
    L_COPY,
    L_CLONE,
    L_COPY_CLONE
};

That would work just fine. It is actually quite common to define enumeration values equal to others for legacy support in various well known libraries.

I believe users of Leptonica could live with such a change, if at the same time a major version step was made, i.e. a release of leptonica-2.0. Perhaps an updated, bug-fixed version 1.74 could be released as well, without the change, for those who don't have enough time at hands to do the conversion.

After all it's just the allheaders.h and function headers which would change, not the functions themselves or their handling of values... of course, except for the case where an enum value is also to be stored in a local variable for some reason.

Last but not least it would be possible to supply Leptonica users with an allheaders-legacy.hpp file with overloaded functions for the old style (with int parameter) mapping to the new style (with enums).

For example such a allheaders-compat.hpp could have (extra) lines like this:

LEPT_DLL extern PIX * pixBilinear ( PIX *pixs, l_float32 *vc, L_INCOLOR incolor );
inline PIX * pixBilinear ( PIX *pixs, l_float32 *vc, l_int32 incolor )  {
    return pixBilinear(pixs, vc, static_cast<L_INCOLOR>(incolor));
}

Then old, unmaintained C++ code could simply include allheaders-legacy.hpp and not care about the change at all.

There won't be any code resulting from such casting. It is a no-op code wise, and it is just to satisfy C++ constraints on the function's parameters.

So basically a (generated?) allheaders-legacy.hpp file would look like allheaders.h, except for the functions where one or more enum values are to be casted from l_int32 to the respective enum's name. That should be doable without too much trouble and keep everyone happy.

Note: The .hpp extension should be used to make it clear that it is required only for C++ compilers to include this file instead of the .h file.

pullmoll avatar Jun 25 '16 05:06 pullmoll

Jürgen, I hadn't seen that equal-value enum trick before. Very nice.

At this time I'm not ready to consider this change. To do the allheaders-compat.h with extra lines would require a significant change in xtractprotos/parseprotos.c, which I played with for maybe a year back around 2002 before it worked correctly. Note also that leptonica has the requirement that it can be compiled with the C compiler, so that it can be linked to both C and C++ applications.

Without modifying allheaders.h, this change would break a lot of applications. Certainly the documentation would be better, but the difference isn't worth the pain that this would cause. The last thing I want to do is encourage people NOT to continue to upgrade as the library is improved. If only for the selfish reason of not wishing to support old versions -- it's good to be able to say "get the latest version and if the issue persists, we'll fix it."

I'll be on vacation for the next 2 months, but reading email and continuing to work on this library. I do appreciate all your help, and I should put the documentation up on leptonica.org. Can you remind me how you run the doxygen generator, and whether we need to escape the '<' symbols within a

...
block, as we are doing?

-- Dan

On Fri, Jun 24, 2016 at 10:25 PM, Jürgen Buchmüller < [email protected]> wrote:

Dan, you can put L_NOCOPY and L_INSERT in the same enum, because you can simply define their values to be equal!

enum L_COPY_FLAGS { L_INSERT, L_NOCOPY = L_INSERT, L_COPY, L_CLONE, L_COPY_CLONE };

That would work just fine. It is actually quite common to define enumeration values equal to others for legacy support in various well known libraries.

I believe users of Leptonica could live with such a change, if at the same time a major version step was made, i.e. a release of leptonica-2.0. Perhaps an updated, bug-fixed version 1.74 could be released as well, without the change, for those who don't have enough time at hands to do the conversion.

After all it's just the allheaders.h and function headers which would change, not the functions themselves or their handling of values... of course, except for the case where an enum value is also to be stored in a local variable for some reason.

Last but not least it would be possible to supply Leptonica users with an allheaders-legacy.hpp file with overloaded functions for the old style (with int parameter) mapping to the new style (with enums).

For example such a allheaders-compat.hpp could have (extra) lines like this:

LEPT_DLL extern PIX * pixBilinear ( PIX *pixs, l_float32 *vc, L_INCOLOR incolor ); LEPT_DLL extern PIX * pixBilinear ( PIX *pixs, l_float32 *vc, l_int32 incolor ) { return pixBilinear(pixs, vc, static_cast<L_INCOLOR>(incolor)); }

Then old code could simply include allheaders-legacy.hpp and not care about the change at all.

There won't be any code resulting from such casting. It is a no-op code wise, and it is just to satisfy C++ constraints on the function's parameters.

So basically a (generated?) allheaders-legacy.hpp file would look like allheaders.h, except for the functions where one or more enum values are to be casted from l_int32 to the respective enum's name. That should be doable without too much trouble and keep everyone happy.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/issues/178#issuecomment-228513132, or mute the thread https://github.com/notifications/unsubscribe/AP6mLPZaTMt0cV0HO51L93_gV7-aesaSks5qPLvEgaJpZM4IJO4i .

DanBloomberg avatar Jun 25 '16 07:06 DanBloomberg

Running doxygen is just doxygen in the directory where the Doxyfile lives. Everything is configured inside this file and it is the default name for a configuration file.

Escaping the < and > symbols is really only necessary if they enclose a word which is not supposed to be a HTML tag. For example if you used angle brackets to denote a speicific value like <min> or <blue> you would need to escape the < and > to avoid min or blue being taken for HTML tags (which do not exist). I think my code's escaping of these symbols was going too far...

pullmoll avatar Jun 25 '16 07:06 pullmoll

On Sat, Jun 25, 2016 at 12:45 AM, Jürgen Buchmüller < [email protected]> wrote:

Running doxygen is just doxygen in the directory where the Doxyfile lives. Everything is configured inside this file and it is the default name for a configuration file.

​worked great. 50MB of documentation made in about 15 seconds! When 1.74 comes out, does one hand edit the Doxyfile?​

Escaping the < and > symbols is really only necessary if they enclose a word which is not supposed to be a HTML tag. For example if you used angle brackets to denote a speicific value like or you would need to escape the < and > to avoid min or blue being taken for HTML tags (which do not exist). I think my code's escaping of these symbols was going too far...

​I'll try to remember to remove them incrementally when files are changed :-)​

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/issues/178#issuecomment-228520552, or mute the thread https://github.com/notifications/unsubscribe/AP6mLKMAPN7PXJcrwLDdwBD4BGFvCRi3ks5qPNyMgaJpZM4IJO4i .

DanBloomberg avatar Jun 25 '16 16:06 DanBloomberg

I'd like to renew this request. Please name the enums.

As a person that uses Leptonica mostly from a C# wrapper it's a huge pain to automatically generate the C# wrapper but then have to deal with manually naming all these anonymous enums that now could be something like PixEnum1, PixEnum2, PixEnum3, all meaningless enum names. C# does not support anonymous enums. They all have to have a name. I'm sure other languages are the same.

As a compromise, another option would be to leave them the way they are but use the comment line directly above the anonymous enum as the name of the enum. You mostly do that but sometimes it's more than a few words, it's like a sentence or a paragraph, or nothing. At least if this were the case, where the name of the enum was right above it, wrapper creators could use it to name the enum properly.

fdncred avatar Feb 05 '19 13:02 fdncred

Tell me more about the compromise suggestion. For example, l. 222 in pix.h, which comments on colors, would this work for you:

/*! L_BOX_COLORS */ enum { ....

DanBloomberg avatar Feb 28 '19 03:02 DanBloomberg

Thanks @DanBloomberg for responding.

Generally speaking it would be nice to have something consistent such as /*! Useful Description Flags */ and the processors could turn that into:

public enum UsefulDescriptionFlags
{
...
}

So, the Useful Description would ideally be 1 or 2 words and Flags would be at the end (or beginning) of all enums if they are flags that can be used in bitwise calculations.

Here's a couple that were a little too verbose in the comments.

    public enum SelectStringInStringcodeForASpecificSerializableDataType
    {
        L_STR_TYPE = 0,
        L_STR_NAME = 1,
        L_STR_READER = 2,
        L_STR_MEMREADER = 3
    }
    public enum SizeSelectionFlagsFor1BppPixForegroundComponents
    {
        L_SELECT_BY_WIDTH = 1,
        L_SELECT_BY_HEIGHT = 2,
        L_SELECT_BY_MAX_DIMENSION = 3,
        L_SELECT_BY_AREA = 4,
        L_SELECT_BY_PERIMETER = 5
    }

Some of the comments are Flags for ... others are blah blah Flags. Consistency would be helpful. So, if you're using Flags in the nomenclature, it would be nice if it was always at the beginning or always at the end.

The one you specifically mentioned was turned into this, which IMHO, is probably at the limit of how long they should be. L_BOX_COLORS is probably better although I'm not sure what the L is supposed to represent.

    public enum ColorsForDrawingBoxes
    {
        L_DRAW_RED = 0,
        L_DRAW_GREEN = 1,
        L_DRAW_BLUE = 2,
        L_DRAW_SPECIFIED = 3,
        L_DRAW_RGB = 4,
        L_DRAW_RANDOM = 5
    }

Another example of L_ - not real sure why the top 3 don't have an L and the last one does.

/*! Colors for 32 bpp */
enum {
    COLOR_RED = 0,        /*!< red color index in RGBA_QUAD    */
    COLOR_GREEN = 1,      /*!< green color index in RGBA_QUAD  */
    COLOR_BLUE = 2,       /*!< blue color index in RGBA_QUAD   */
    L_ALPHA_CHANNEL = 3   /*!< alpha value index in RGBA_QUAD  */
};

One last thing, they need to be unique. Having L_BOX_COLORS defined in several places with different values would be a mess. I'm not real sure what would happen.

Thanks for considering this.

fdncred avatar Feb 28 '19 20:02 fdncred

Why the L_? These are globals, and I wasn't too careful in the early days about trying to avoid namespace collisions.

As for the enums, I am confused. If in the example I gave before, we had

/*! Box Color Flags */ enum { ...

then are you saying that some piece of code would rewrite this into

[public] enum BoxColorFlags { ...

?

And if that were done, would the use of these flags in function prototypes require using the enum name as a type for the arguments?

Example: if a function currently takes one of these enum values F(...., l_int32 boxcolor, ...)

would that need a new prototype declaration that uses the enum name F(..., BoxColorFlags boxcolor, ...)

DanBloomberg avatar Feb 28 '19 20:02 DanBloomberg

As for the enums, I am confused. If in the example I gave before, we had

/*! Box Color Flags */ enum { ...

then are you saying that some piece of code would rewrite this into

[public] enum BoxColorFlags { ...

Yes, that is correct.

And if that were done, would the use of these flags in function prototypes require using the enum name as a type for the arguments?

Example: if a function currently takes one of these enum values F(...., l_int32 boxcolor, ...)

would that need a new prototype declaration that uses the enum name F(..., BoxColorFlags boxcolor, ...)

Yes, and then C# would reference them like BoxColorFlags.L_DRAW_RED for the 0 int value of that enum.

The C code for functions and prototypes wouldn't change, just the comments.

Now, people who make and use wrappers, like me, will have a ton of working replacing all the l_int32 boxcolor items with BoxColorFlags boxcolor. Which will be a pain but it's better than having anonymous enums.

In my perfect world the enums in the C code would be named and you'd reference them similarly to the way that C# does, ala enum_name dot enum.

fdncred avatar Feb 28 '19 21:02 fdncred

I just had an idea and would like to chime in.

We could create a allheaders.hpp file where the allheaders.h defined l_int32 parameters would be replaced with enumeration name parameters. This header should also define the enumerations.

We could put all these enumerations and wrappers into a class Leptonica { }. A wrapper for e.g.

PIXA * pixaClone( PIXA *pixa, l_int32 copyflag );

would then be defined inside the class Leptonica like this

#include <allheaders.h>
…
class Leptonica
{
public:
…
    enum CopyClone {
        lNoCopy = L_NOCOPY,
        lInsert = L_INSERT,
        lCopy = L_COPY,
        lClone = L_CLONE,
        lCopyClone = L_COPYCLONE
    };
…
    PIXA* pixaClone(PIXA* pixa, CopyClone copyflag) {
        return pixaClone(pixa, static_cast<l_int32>(copyflag));
    }
…
}

Compilers should be able to effectively optimize these wrappers away from the code and turn them into a direct call of the C function.

The main problem is from where to retrieve the info for which l_int32 parameter is to be replaced with what enumeration name. Perhaps this can be extracted out of the Doxygen comments when / if they all define the enumerations to be used for their parameters. For example:

/*!
 * \brief   pixaCopy()
 *     
 * \param[in]    pixa
 * \param[in]    copyflag (enum CopyClone)
 *               see pix.h for details:
 *                 L_COPY makes a new pixa and copies each pix and each box;
 *                 L_CLONE gives a new ref-counted handle to the input pixa;
 *                 L_COPY_CLONE makes a new pixa and inserts clones of
 *                     all pix and boxes
 * \return  new pixa, or NULL on error
 */
PIXA *
pixaCopy(PIXA    *pixa,
         l_int32  copyflag)

Here an allheaders.hpp generator could extract the enumeration name and emit the required code. Of course going through all source files and adding that information is initially a lot of work.

Instead of going through the source we could also add a file e.g. params.csv containing function,parameter,enumeration triplets like:

pixaCopy,copyflag,CopyFlag
pixaAddPix,copyflag,CopyFlag
…
pixaaGetPixa,accesstype,CopyFlag
…

Just my 2 cents.

pullmoll avatar Mar 01 '19 08:03 pullmoll

Seems like a lot of work for 2 cents ;-)

And I'm not sure that leptonica would remain a C library. (The original intent of keeping it C is that it could be used with both C and C++ libraries.)

In any event, I've carried out Darren's 'compromise' suggestion of naming the enums in a comment on the line directly above the enum definition. Will push to head in the next hour or so.

DanBloomberg avatar Mar 02 '19 01:03 DanBloomberg

Thanks for this @DanBloomberg. Changes look good so far.

fdncred avatar Mar 06 '19 19:03 fdncred