tess4j icon indicating copy to clipboard operation
tess4j copied to clipboard

Api request: Use enum's instead of int's for constants

Open ChristianSchwarz opened this issue 5 years ago • 5 comments

The current API accepts int constants in very many places. The constants belonging to a context are bundled in dedicated interfaces. Thats nice, but when you call a method like Tesseract.setPageSegMode(int mode) you don't know which constants have to be passed. The JavaDoc doesn't help here mouch, it says: "the page segmentation mode to set". If you dig throu the packages and interfaces you may find the interface TessPageSegMode inside of interface ITessAPI. But that is not that likely since nested interfaces are not common. As you can see the explorability of the API can be improved.

Enums to the rescue

When the API would use enum's instead of int's, the IDE can suggest you the enum values that can be passed to the method. There is even no need to update the JavaDocs in order to point to the constant holder interface. An other positvie side effect is that is is impossible to pass in wrong values.

How it would look like:

interface ITessAPI {
  void setPageSegMode(TessPageSegMode mode);
enum TessPageSegMode{
     PSM_OSD_ONLY(0),
     PSM_AUTO_OSD(1),
     ...
     public int value;
     
     private TessPageSegMode(int value){
        this.value=value
     }
}

ChristianSchwarz avatar Jan 14 '20 10:01 ChristianSchwarz

We understand the values and benefits of using enum; however, the class files were generated by JNAerator, which does not seem to use enum for constants (maybe there's an option that we're not aware of).

Would introduction of enum break existing client codes (programs that use tess4j)? Have you tried?

nguyenq avatar Jan 19 '20 16:01 nguyenq

Would introduction of enum break existing client codes (programs that use tess4j)? Have you tried?

If you change the signature of the existing methods then yes, it will break the code. But you could provide overloaded methods. Also it is possible to add an abstraction layer that capsulates the generated code. This way are decoupled from the generated code.

ChristianSchwarz avatar Jan 20 '20 15:01 ChristianSchwarz

Hi Christain,

It deems to be a breaking change to the current API. As such, we think it is probably more appropriate that the change should be applied to the next major upgrade, which is currently aimed for Tesseract 5.0.0 release.

Can you submit a PR for this? The methods that use non-enum constants should be deprecated.

Thanks, Quan

nguyenq avatar Jan 23 '20 01:01 nguyenq

I am short on time, but i can start a PR. Is there a v5.0.0 branch already ? I would start to decouple the tess4j API from the generated JNA implementation. So it is later possible to adopt Tesseract API changes without overwriting the tess4j API. WDYT?

ChristianSchwarz avatar Jan 27 '20 15:01 ChristianSchwarz

You can code against the master branch. No hurry, just take your time as Tesseract 5.0.0 is still in beta.

nguyenq avatar Jan 28 '20 01:01 nguyenq