nlp4j
nlp4j copied to clipboard
Deprecate disunified repos
@jdchoi77 Here is the unified repo. I will proceed from here in PR's for other work. The readme might want some more work to explain the structure, let me know if you want me to do it.
Thanks for the work; I'm trying to think if it is worth at all to have these projects separated at this point. They share the same package naming so it's fairly easy to merge all of them into a super project. What do you think?
Just to be sure we're on the same page: you're suggesting taking an additional step of combining core, common, morphology, tokenization, and 'nlp4j' into a single Maven project: one src/main, etc?
If it were up to me, I wouldn't go that far. I'd reduce it to two modules: one with all the code as viewed as an API, and another with the command line interfaces and wrappers. It's mostly a question of taste to separate commands from API; nothing horrible happens if you lump them together. This much split means that we can use less complex Maven tricks to have the right stuff in the classpath of the commands that we don't want to force into the classpath of applications.
However, didn't you tell me before that you originally split them up into multiple jars because someone complained? Is that complaint still a concern?
On Tue, Jul 26, 2016 at 11:54 AM, Jinho D. Choi [email protected] wrote:
Thanks for the work; I'm trying to think if it is worth at all to have these projects separated at this point. They share the same package naming so it's fairly easy to merge all of them into a super project. What do you think?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235313720, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z8Z1HLx2zn1W7wdzJEsPlEWvV6aGks5qZi27gaJpZM4JSL7S .
Yes, that was the concern, but now we have modulated the code much better, I think it should be ok (NLP4J, previously called ClearNLP, even before called ClearParser, has evolved a lot since 2011 :) I'd take the suggestion of separating CLI to the pure API. If you'd like to do this merge, I can spend some time later today to review the code and finalize the unification. Thanks!
I'll put the stream business aside for now and do the unification today.
On Tue, Jul 26, 2016 at 12:16 PM, Jinho D. Choi [email protected] wrote:
Yes, that was the concern, but now we have modulated the code much better, I think it should be ok (NLP4J, previously called ClearNLP, even before called ClearParser, has evolved a lot since 2011 :) I'd take the suggestion of separating CLI to the pure API. If you'd like to do this merge, I can spend some time later today to review the code and finalize the unification. Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235320806, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z9OQOH6g2TyMVXCtrXLKwReaSYPFks5qZjLigaJpZM4JSL7S .
I've put up a PR for this. I think I've accounted for everything. It's a bit disorienting.
I hope you won't be too annoyed that I added some very small and local changes that rescue the useful part of the big file/stream PR. It's completely compatible. If you don't like it, let me know, and I can move those commits to another PR. I was sitting on the wrong branch, and I need to go do an errand with my wife, so I decided to just push this up to you and see if it was OK.
I just merged to PR. I'm going to take a look and see if there are changes needed now. Will get back to you soon.
I updated the pom.xml of submodules, removing the versions for the submodules. If I remember right, this would adapt the parent's version, but please let me know if this should be back. I also made a small change due to the upgrade of fastutils. Thanks.
I'll have a look.
On Wed, Jul 27, 2016 at 10:21 AM, Jinho D. Choi [email protected] wrote:
I updated the pom.xml of submodules, removing the versions for the submodules. If I remember right, this would adapt the parent's version, but please let me know if this should be back. I also made a small change due to the upgrade of fastutils. Thanks.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235600043, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z23v68SxTOnBU0qYKPoDOcjAgoSlks5qZ2lVgaJpZM4JSL7S .
I just moved the tokenization package to edu/emory/mathcs/nlp/component/tokenizer/ . I'll be testing each module this afternoon.
Your POM change was correct. The only version needed in a submodule is the one in the parent, I missed cleaning that up.
On Wed, Jul 27, 2016 at 10:28 AM, Benson Margulies [email protected] wrote:
I'll have a look.
On Wed, Jul 27, 2016 at 10:21 AM, Jinho D. Choi [email protected] wrote:
I updated the pom.xml of submodules, removing the versions for the submodules. If I remember right, this would adapt the parent's version, but please let me know if this should be back. I also made a small change due to the upgrade of fastutils. Thanks.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235600043, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z23v68SxTOnBU0qYKPoDOcjAgoSlks5qZ2lVgaJpZM4JSL7S .
There will be a small celebration up here once we have a release from this version.
On Wed, Jul 27, 2016 at 10:28 AM, Jinho D. Choi [email protected] wrote:
I just moved the tokenization package to edu/emory/mathcs/nlp/component/tokenizer/ . I'll be testing each module this afternoon.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235602558, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9zzAbVL2YitLyTXTCD9JQajviVC5iks5qZ2scgaJpZM4JSL7S .
Haha yes, also, how about calling the "commands" project as "cmd" or "cli"? I should check what people usually call this kind of project.
I don't have a gut sense of the relative popularity of these options. I do think that 'cli' is more descriptive, so if you're inclined to make a change, I'd vote for 'cli' over 'cmd'.
On Wed, Jul 27, 2016 at 10:59 AM, Jinho D. Choi [email protected] wrote:
Haha yes, also, how about calling the "commands" project as "cmd" or "cli"? I should check what people usually call this kind of project.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235612935, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z-k2e05VFiDvMgRriYGt0qwQ5DyZks5qZ3JpgaJpZM4JSL7S .
I like 'cli' as well, let me see if i can make this change.
Another thing that we could do here at some point is to add the maven-assembly-plugin to make a binary tarball with a bin and a lib and such, so that someone can just download that and start running commands.
On Wed, Jul 27, 2016 at 11:06 AM, Benson Margulies [email protected] wrote:
I don't have a gut sense of the relative popularity of these options. I do think that 'cli' is more descriptive, so if you're inclined to make a change, I'd vote for 'cli' over 'cmd'.
On Wed, Jul 27, 2016 at 10:59 AM, Jinho D. Choi [email protected] wrote:
Haha yes, also, how about calling the "commands" project as "cmd" or "cli"? I should check what people usually call this kind of project.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235612935, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z-k2e05VFiDvMgRriYGt0qwQ5DyZks5qZ3JpgaJpZM4JSL7S .
That's a great idea; i've been doing that manually but maven plugin should help.