jumanpp icon indicating copy to clipboard operation
jumanpp copied to clipboard

[Question] Using Juman++ in dart using dart:ffi

Open CaptainDario opened this issue 3 years ago • 10 comments

I am developing a flutter app and would like to include morphological analysis. Now I am considering writing a package to use juman++ in dart using dart:ffi. This allows using native c code in dart. Now I am wondering if there is platform-dependent code or something similar which would prevent this project to run on iOS/Android. Are the ML models used also available for TensorFlow lite? Any help or suggestions would be very much appreciated.

CaptainDario avatar Feb 15 '22 11:02 CaptainDario

There should not be platform dependent code in Juman++ and it runs on M1-based macs natively without problem. There is, however, no C API, only C++ one. I am not familiar with Dart and its FFI, but most of FFI interactions happen via C API. Finally, the Juman++ model is not just a neural net and TensorFlow Lite compatibility really makes no sense in this context.

eiennohito avatar Feb 15 '22 11:02 eiennohito

Thanks for your reply!

I never tried to use c++ in dart:ffi, however it looks like c++ should be usable if c++ symbols are marked as extern C. Could that cause any problems? If not I would give it a shot because this seems to be working much better for analyzing random texts from the internet compared to MeCab.

Sorry about the stupid tensorflow assumption. Somehow I thought the RNN uses tensorflow, but briefly looking at the code it seems like juman is using a custom build RNN.

CaptainDario avatar Feb 15 '22 12:02 CaptainDario

It is not just a matter of extern C to expose C++ API as C. In the current state, the API is probably unusable from C. First, all Juman++ strings (e.g. dictionary fields and morphemes surface) are not null-terminated strings, but slices. The abstraction for it (StringPiece) would not easily unusable from C. Making a C api is in backlog(#61), but I never had the time or had a need for it myself.

eiennohito avatar Feb 15 '22 12:02 eiennohito

Thanks again for the reply. I think the extern c is just to make dart aware of the c++ code. The docs say that c++ code should work. There is also an OpenCV dart:ffi version and a blog of the author of how to use c++. Therefore it seems possible to use juman++ in dart:ffi than. I think I will give it a shot and come back here to ask if there are any problems directly related to juman.

One more question, because for mobile the download size matters, should I also expect a size of 300mb?

CaptainDario avatar Feb 15 '22 13:02 CaptainDario

Yes, model is pretty large, it is an unfortunate tradeoff with analysis accuracy here. I am not really sure that Juman++ is a good fit for mobile if the analysis accuracy is not of utmost importance.

extern C changes the symbol mangling for C++ symbols and Juman++ API surface is not only simple functions as shown in the Dart FFI example. Also, OpenCV should have a stable C API regardless of the implementation language.

eiennohito avatar Feb 15 '22 13:02 eiennohito

I think the size would not be too much of an issue as long as it does not cross the gigabyte mark.

That sounds discouraging, are there plans for a C-API?

CaptainDario avatar Feb 15 '22 21:02 CaptainDario

After further investigation, you are right dart:ffi can only bind to C-APIs. As there are already a few people asking for a C-API are there any plans for something in the near future?

CaptainDario avatar Feb 15 '22 21:02 CaptainDario

Unfortunately, it is very low in the list of my priorities, I probably won't work on it in any foreseeable future.

eiennohito avatar Feb 16 '22 01:02 eiennohito

That is sad to hear. I am quite clueless with C++ programming but would it be possible to only have a binding/C-API for the main entry point? Because I basically want to use this library as an off-the-shelf component and use it like shown in the docs

echo "魅力がたっぷりと詰まっている" | jumanpp

But if that also has some big hindrances I will stick to mecab.

CaptainDario avatar Feb 16 '22 11:02 CaptainDario

The simplest entry point is something like https://github.com/eiennohito/jumanpp-t9/blob/master/src/jumanpp_t9.cc and sure the C API can be done. I don't think that I will work on in in the nearest future, though. MeCab is probably your best bet as it has C API. I will probably implement MeCab-compatible C API if I will implement C API in the future, because MeCab is de-facto standard.

eiennohito avatar Feb 16 '22 12:02 eiennohito