Poor performance on various tasks using the provided example
Hi, thanks for publishing the model and its weights - it looks very promising. Sadly I can't get good results out of it using the provided notebook. For example if I ask more complex questions (which it should support due to pretraining as far as I understand), it fails to produce correct answers.
Let's say I modify the prompt to task_prefix = "Layout Modeling. <layout_0> Manuscript </layout_0> review" - the model gives me 'form', which is incorrect, and I get in most cases either "form" or some other incorrect answer. For task_prefix = 'information extraction. What is the completion date?' it gives me '3/16/68' which is close but still incorrect. Can you provide more elaborate examples with different task prompts so it will be possible to check if it's something on my side or if there is a problem with the model?