deep_reference_parser Consider spans in output

In the output of split_parser, split and parser we have an output of tokens and predictions.

It may be worth considering a different type of output with the spans of each reference/token rather than the tokens themselves.

Apr 29 '20 11:04 lizgzil

I am not sure how controversial this would be but it would definitely eliminate the need to merge tokens after as the algorithm would extract start and end for each component in a QA fashion

Apr 30 '20 07:04 nsorros

I thought of these outputs as placeholders. All those scripts are not suitable for production because they would instantiate the model every time they made a prediction, so their utility is somewhat limited. That said, I think I implemented an --output flag which will dump the output to a json.

Apr 30 '20 23:04 ivyleavedtoadflax

@ivyleavedtoadflax ok that makes sense re outputs.

In terms of the instantiation of the model, is it not true that

splitter_parser = SplitParser(config_file=MULTITASK_CFG)

instantiates the model and then you could do

reference_predictions = splitter_parser.split_parse(text)

as many times as you wanted without having to reinstantiate the model?

May 01 '20 10:05 lizgzil

@ivyleavedtoadflax ok that makes sense re outputs.

In terms of the instantiation of the model, is it not true that
splitter_parser = SplitParser(config_file=MULTITASK_CFG)
instantiates the model and then you could do
reference_predictions = splitter_parser.split_parse(text)
as many times as you wanted without having to reinstantiate the model?

Even though unrelated to this issue, I am almost 100% you are right. @ivyleavedtoadflax can confirm.

May 01 '20 12:05 nsorros

Yup exactly right @lizgzil. That's not how I had done it in the split, parse, split_parse commands, which is why they are no good for prod.

May 01 '20 19:05 ivyleavedtoadflax

deep_reference_parser deep_reference_parser copied to clipboard

Consider spans in output

deep_reference_parser
deep_reference_parser copied to clipboard