grobid
grobid copied to clipboard
Merging two functionalities in grobid
Right now, grobid follows the cascading like:
Segmentation->Header, Fulltext, Reference segmenter
And I want to also use:
Custom segmentation->Custom feature
Is it possible to combine both in one build? what I mean is that right now when I run processFullText, grobid follows the set hierarchy, but let's say if I run processMyFeature and I want grobid to follow some other custom hierarchy like I mentioned above.
All in all is it possible to add both of these seperate cascadings in one single build??
Thanks in advance!
Hi @Tanmay98
Is "Custom feature" an existing submodel of Grobid or you would like to write your own?
The existing sub-models are constrained in term of input/output, and all the output do not have a final serialization - so something to return. At least a new result serialization (in xml or json) would be necessary.
If one wants to add its own process at any stages of the processing hierarchy, currently some Java development for this new process is required. This is done in the grobid modules listed here, which introduce additional models applied after segmentation or fulltext, on certain relevant substructures.
Thankyou for your quick response @kermitt2 !
Actually, no the custom feature is not an existing submodule of grobid.
My concern is that i want two seperate hierarchies to run. For example, I want to use the current hierarchy that grobid by default follows as well my other custom heirarchy. I was wondering if it was possible?
Also, I did went through the grobid-dictionary submodule. So regarding that I assumed that using maven I will be only able to run the dictionary part and not the default grobid features using one single server.
I am sorry but I am new to java and maven, etc. (I know Machine Learning very well). Is it possible to run both the grobid dictionary modules as well as default grobid modules by running only one server? As in if i run maven
/./gradlew run
, I am able to run both processfulltext as well as processDictionary?
Hi @kermitt2, my goal was to train models such as segmentation(for grobid) and segmentation(grobid-dictionary) from a single server run (./gradlew run
)
So I tried to combine both grobid dictionary modules and grobid modules in one single pipeline.
I made necessary files in grobid-core and grobid-trainer as well as attached two different TEI formatter (one for grobid dictionary and other for grobid). Finally I also did changes in the gradle build file.
I was able to successfully build the library but when I run ./gradlew train_dictionary_body_segmentation
, I get the following errors
Can you help me?
Hello @Tanmay98 !
Apparently you need to load a property file specific to grobid-dictionaries and instantiate a GrobidDictionaryProperties
object.
But I am was not part of the developers of grobid-dictionaries - you will certainly receive better help by asking in the grobid-dictionaries.