deep_reference_parser icon indicating copy to clipboard operation
deep_reference_parser copied to clipboard

Consider spans in output

Open lizgzil opened this issue 4 years ago • 5 comments

In the output of split_parser, split and parser we have an output of tokens and predictions.

It may be worth considering a different type of output with the spans of each reference/token rather than the tokens themselves.

lizgzil avatar Apr 29 '20 11:04 lizgzil

I am not sure how controversial this would be but it would definitely eliminate the need to merge tokens after as the algorithm would extract start and end for each component in a QA fashion

nsorros avatar Apr 30 '20 07:04 nsorros

I thought of these outputs as placeholders. All those scripts are not suitable for production because they would instantiate the model every time they made a prediction, so their utility is somewhat limited. That said, I think I implemented an --output flag which will dump the output to a json.

ivyleavedtoadflax avatar Apr 30 '20 23:04 ivyleavedtoadflax

@ivyleavedtoadflax ok that makes sense re outputs.

In terms of the instantiation of the model, is it not true that

splitter_parser = SplitParser(config_file=MULTITASK_CFG)

instantiates the model and then you could do

reference_predictions = splitter_parser.split_parse(text)

as many times as you wanted without having to reinstantiate the model?

lizgzil avatar May 01 '20 10:05 lizgzil

@ivyleavedtoadflax ok that makes sense re outputs.

In terms of the instantiation of the model, is it not true that

splitter_parser = SplitParser(config_file=MULTITASK_CFG)

instantiates the model and then you could do

reference_predictions = splitter_parser.split_parse(text)

as many times as you wanted without having to reinstantiate the model?

Even though unrelated to this issue, I am almost 100% you are right. @ivyleavedtoadflax can confirm.

nsorros avatar May 01 '20 12:05 nsorros

Yup exactly right @lizgzil. That's not how I had done it in the split, parse, split_parse commands, which is why they are no good for prod.

ivyleavedtoadflax avatar May 01 '20 19:05 ivyleavedtoadflax