deep_reference_parser
deep_reference_parser copied to clipboard
Consider spans in output
In the output of split_parser
, split
and parser
we have an output of tokens and predictions.
It may be worth considering a different type of output with the spans of each reference/token rather than the tokens themselves.
I am not sure how controversial this would be but it would definitely eliminate the need to merge tokens after as the algorithm would extract start and end for each component in a QA fashion
I thought of these outputs as placeholders. All those scripts are not suitable for production because they would instantiate the model every time they made a prediction, so their utility is somewhat limited. That said, I think I implemented an --output
flag which will dump the output to a json.
@ivyleavedtoadflax ok that makes sense re outputs.
In terms of the instantiation of the model, is it not true that
splitter_parser = SplitParser(config_file=MULTITASK_CFG)
instantiates the model and then you could do
reference_predictions = splitter_parser.split_parse(text)
as many times as you wanted without having to reinstantiate the model?
@ivyleavedtoadflax ok that makes sense re outputs.
In terms of the instantiation of the model, is it not true that
splitter_parser = SplitParser(config_file=MULTITASK_CFG)
instantiates the model and then you could do
reference_predictions = splitter_parser.split_parse(text)
as many times as you wanted without having to reinstantiate the model?
Even though unrelated to this issue, I am almost 100% you are right. @ivyleavedtoadflax can confirm.
Yup exactly right @lizgzil. That's not how I had done it in the split
, parse
, split_parse
commands, which is why they are no good for prod.