Missing Enums and other stuff?
I may be missing the point here but in Leptonica, I'm missing enums and probably other things that are declared in the other .c and .h files. Are those other files being parsed and used? This is my csproj for my Leptpig solution.
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.2</TargetFramework>
<Platforms>AnyCPU;x64;x86</Platforms>
<LangVersion>latest</LangVersion>
</PropertyGroup>
<!--<PropertyGroup
Condition="'$(OS)' == 'Windows_NT' and
'$(PlatformTarget)' == 'x86' and
'$(TargetFrameworkIdentifier)' == '.NETCoreApp' and
'$(SelfContained)' != 'true'"
>
<RunCommand>$(MSBuildProgramFiles32)\dotnet\dotnet</RunCommand>
</PropertyGroup>
<PropertyGroup
Condition="'$(OS)' == 'Windows_NT' and
'$(PlatformTarget)' == 'x64' and
'$(TargetFrameworkIdentifier)' == '.NETCoreApp' and
'$(SelfContained)' != 'true'"
>
<RunCommand>$(ProgramW6432)\dotnet\dotnet</RunCommand>
</PropertyGroup>-->
<ItemGroup>
<PackageReference Include="piggy" Version="1.0.12">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>
</PackageReference>
</ItemGroup>
<ItemGroup>
<Piggy Update="lep.pig">
<!-- -f -->
<ClangSourceFile>include.c</ClangSourceFile>
<!-- -o -->
<AstOutputFile>ast.txt</AstOutputFile>
<!-- -c -->
<ClangOptions>"ID:/Src/GitHub/Leptonica/src" "ID:/Src/GitHub/Leptonica/build/src" "ID:/Src/GitHub/Leptonica/prog"</ClangOptions>
<!-- -o the -o means the output file -->
<OutputFile>"C:\Users\username\Documents\Visual Studio 2017\Projects\LeptPig\LeptPig\Output"</OutputFile>
</Piggy>
</ItemGroup>
<ItemGroup>
<!--<Folder Include="obj\x64\Debug\netcoreapp2.2\" />-->
<Folder Include="Output\" />
</ItemGroup>
</Project>
You can see I'm including 3 sets of folders in the ClangOptions parameter.
In my include.c I only have allheaders.h. Should I have other items in my include.c?
As an example, here are a few things missing from pix.h.
/*! Colors for 32 bpp */
enum {
COLOR_RED = 0, /*!< red color index in RGBA_QUAD */
COLOR_GREEN = 1, /*!< green color index in RGBA_QUAD */
COLOR_BLUE = 2, /*!< blue color index in RGBA_QUAD */
L_ALPHA_CHANNEL = 3 /*!< alpha value index in RGBA_QUAD */
};
static const l_int32 L_RED_SHIFT =
8 * (sizeof(l_uint32) - 1 - COLOR_RED); /* 24 */
static const l_int32 L_GREEN_SHIFT =
8 * (sizeof(l_uint32) - 1 - COLOR_GREEN); /* 16 */
static const l_int32 L_BLUE_SHIFT =
8 * (sizeof(l_uint32) - 1 - COLOR_BLUE); /* 8 */
static const l_int32 L_ALPHA_SHIFT =
8 * (sizeof(l_uint32) - 1 - L_ALPHA_CHANNEL); /* 0 */
/*-------------------------------------------------------------------------*
* Colors for drawing boxes *
*-------------------------------------------------------------------------*/
/*! Colors for drawing boxes */
enum {
L_DRAW_RED = 0, /*!< draw in red */
L_DRAW_GREEN = 1, /*!< draw in green */
L_DRAW_BLUE = 2, /*!< draw in blue */
L_DRAW_SPECIFIED = 3, /*!< draw specified color */
L_DRAW_RGB = 4, /*!< draw as sequence of r,g,b */
L_DRAW_RANDOM = 5 /*!< draw randomly chosen colors */
};
/*-------------------------------------------------------------------------*
* Perceptual color weights *
*-------------------------------------------------------------------------*/
/* <pre>
* Notes:
* (1) These perceptual weighting factors are ad-hoc, but they do
* add up to 1. Unlike, for example, the weighting factors for
* converting RGB to luminance, or more specifically to Y in the
* YUV colorspace. Those numbers come from the
* International Telecommunications Union, via ITU-R.
* </pre>
*/
static const l_float32 L_RED_WEIGHT = 0.3f; /*!< Percept. weight for red */
static const l_float32 L_GREEN_WEIGHT = 0.5f; /*!< Percept. weight for green */
static const l_float32 L_BLUE_WEIGHT = 0.2f; /*!< Percept. weight for blue */
/*-------------------------------------------------------------------------*
* Flags for colormap conversion *
*-------------------------------------------------------------------------*/
/*! Flags for colormap conversion */
enum {
REMOVE_CMAP_TO_BINARY = 0, /*!< remove colormap for conv to 1 bpp */
REMOVE_CMAP_TO_GRAYSCALE = 1, /*!< remove colormap for conv to 8 bpp */
REMOVE_CMAP_TO_FULL_COLOR = 2, /*!< remove colormap for conv to 32 bpp */
REMOVE_CMAP_WITH_ALPHA = 3, /*!< remove colormap and alpha */
REMOVE_CMAP_BASED_ON_SRC = 4 /*!< remove depending on src format */
};
#define PIX_SRC (0xc) /*!< use source pixels */
#define PIX_DST (0xa) /*!< use destination pixels */
#define PIX_NOT(op) ((op) ^ 0x0f) /*!< invert operation %op */
#define PIX_CLR (0x0) /*!< clear pixels */
#define PIX_SET (0xf) /*!< set pixels */
#define PIX_PAINT (PIX_SRC | PIX_DST) /*!< paint = src | dst */
#define PIX_MASK (PIX_SRC & PIX_DST) /*!< mask = src & dst */
#define PIX_SUBTRACT (PIX_DST & PIX_NOT(PIX_SRC)) /*!< subtract = */
/*!< src & !dst */
#define PIX_XOR (PIX_SRC ^ PIX_DST) /*!< xor = src ^ dst */
I see this stuff in the ast.txt, except for the #define statements. It makes me wonder what % of items in my ast.txt is getting written out to .cs files and what % is omitted for whatever reason.
I see. Those enums have no name, so they don't match any of the current enum patterns. 'Name=...' means there has to be a Name attribute in the AST, but there's none. '!Name' means match when there's no Name attribute. I can add a pattern for those and generate a name. But, the question is, what name?
I've seen two options based on other p/invoke tools.
- Have some generic name with an incrementing value at the end. So, all enums in pix.c/.h would be PixEnum1, PixEnum2, PixEnum3, etc. This leaves it to the user to correct the name. The problem with this is it's always overwritten the next time you regenerate.
- I've seen smarter routines that take what's common in the enum constants and name the enum with that. So, RemoveCMap for the one with all the REMOVE_CMAP and LDraw for all the ones with L_DRAW. And when nothing is consistent, fall back to item 1 above.
The problem with the first might be that there's no consistent generated number from one run to the next, e.g., PixEnum1.foobar, then PixEnum2.foobar, depending on the version of the C code source. So, from one compile to the next, it may not be the same. The second seems easy enough to do. I'll add both, with some kind of switch to choose which naming convention.
Yep, generating a name like "GeneratedEnum" + counter++ isn't going to work very well. There are 90+ anonymous enums in Leptonica. Some enum constants have only "L_" in common. Perhaps looking for an enclosing type and use that name as prefix. Will try a few different things....
Another option would be to use the comments for naming. In other C# wrappers of Leptonica I see this.
/*! Access and storage flags */
enum {
L_NOCOPY = 0, /*!< do not copy the object; do not delete the ptr */
L_INSERT = L_NOCOPY, /*!< stuff it in; do not copy or clone */
L_COPY = 1, /*!< make/use a copy of the object */
L_CLONE = 2, /*!< make/use clone (ref count) of the object */
L_COPY_CLONE = 3 /*!< make a new array object (e.g., pixa) and fill */
/*!< the array with clones (e.g., pix) */
};
namespace Leptonica
{
/// <summary>
/// Access and storage flags
/// </summary>
public enum AccessAndStorageFlags
{
/// <summary>
/// do not copy the object; do not delete the ptr
/// </summary>
L_NOCOPY = 0,
/// <summary>
/// stuff it in; no copy or clone
/// </summary>
L_INSERT = 0,
/// <summary>
/// make/use a copy of the object
/// </summary>
L_COPY = 1,
/// <summary>
/// make/use clone (ref count) of the object
/// </summary>
L_CLONE = 2,
/// <summary>
/// make a new object and fill each object in the array(s) with clones
/// </summary>
L_COPY_CLONE = 3
}
}
The longest prefix generated name still is pretty bad. Sometimes the longest prefix is just "L_", sometimes just "". But, you're right about the comments. Looking at the AST for Leptonica, there's a comment that shows up in the AST within the EnumDecl as (EnumDecl (FullComment (* TextComment Text=* *) ) ). I'll add a pattern to take the comment and use that as a name. Now, some of the comments can be quite long. And, there may not be any comment in the AST for the EnumDecl. So, I'll also add some code to do something like "if you see an enum id 'foobar' in this EnumDecl tree, then the generated name of the anonymous enum should be 'xxxx'". I've noticed that many comments aren't in the AST because maybe they're ripped out by the preprocessor.
I've checked in some code that reads the comment for the enum and converts it into a name. If there is no comment, the default is just a "g-generated" name, which doesn't occur for Leptonica. Of course, if the comments change, then the name changes, and then any existing code that uses the generated enum would now no longer compile. I'll now add in code to use a given name if the anonymous enum type contains an enum value with a given name.
I added the code to deal with anonymous enums. I added code to check if an enum constant decl name is in a dictionary, and if so, use the given name for the enum declaration in generating the C# interface. Released v1.0.13.
Thanks! I'll give it a whirl sometime today and report back.
Very nice indeed. Enums look wonderful. Granted, some of the names are extremely long but I think your work is great here! You can only work on what you're given. So, thank you! I haven't validated that every single enum is there but once I start using it, I'll report any discrepancies.
Thank you for your help. I will be updating Piggy over the next few months, and will let you know how it changes. I plan to use it to convert C++ source code into C#. My plan is to convert Microsoft's Net Core entirely into C# for my compiler.
I have more problems, of course, but i'll post them in different issues. I hope you don't mind me posting all this stuff.
Your plan sounds ambitious. Good luck. I'll be following your work.
I worked with the Leptonica author to add/change his anonymous enum comments but Piggy isn't quite playing nice. Example:
/*! For jbGetComponents(): type of component to extract from images */
/*! JB Component */
enum {
JB_CONN_COMPS = 0,
JB_CHARACTERS = 1,
JB_WORDS = 2
};
Turns into:
public enum ForJbGetComponentsTypeOfComponentToExtractFromImages
{
JB_CONN_COMPS = 0,
JB_CHARACTERS = 1,
JB_WORDS = 2
}
Expected:
public enum JBComponent
{
JB_CONN_COMPS = 0,
JB_CHARACTERS = 1,
JB_WORDS = 2
}
Same here:
/*! For printing out array data */
/*! Sudoku Output */
enum {
L_SUDOKU_INIT = 0,
L_SUDOKU_STATE = 1
};
Turns into:
public enum ForPrintingOutArrayData
{
L_SUDOKU_INIT = 0,
L_SUDOKU_STATE = 1
}
And here:
/*! Constants for deciding when text block is divided into paragraphs */
/*! Split Text */
enum {
SPLIT_ON_LEADING_WHITE = 1, /*!< tab or space at beginning of line */
SPLIT_ON_BLANK_LINE = 2, /*!< newline with optional white space */
SPLIT_ON_BOTH = 3 /*!< leading white space or newline */
};
And here:
/*! Control printing of error, warning and info messages */
/*! Message Control */
enum {
L_SEVERITY_EXTERNAL = 0, /* Get the severity from the environment */
L_SEVERITY_ALL = 1, /* Lowest severity: print all messages */
L_SEVERITY_DEBUG = 2, /* Print debugging and higher messages */
L_SEVERITY_INFO = 3, /* Print informational and higher messages */
L_SEVERITY_WARNING = 4, /* Print warning and higher messages */
L_SEVERITY_ERROR = 5, /* Print error and higher messages */
L_SEVERITY_NONE = 6 /* Highest severity: print no messages */
};
Is there any chance this is an easy change? BTW - This is with Piggy 1.0.13. Thanks, Darren
@kaby76 Just checking in to see if you saw that I reopened this issue.
I haven't looked at these yet. I've been rewriting the entire regular expression engine to use an NFA approach. I'll be done with this probably this week, make a release, with a bug fix.