storm icon indicating copy to clipboard operation
storm copied to clipboard

The article outline does not match the one in storm_gen_outline.txt

Open xuxiangwork opened this issue 3 months ago • 1 comments

This project is very good, I carefully read the project code, some of the ideas are very wonderful. But I still have some confusion when I read the code.

I noticed that the outline argument was not used in the forward function of the ConvToSection class. In this way, articles generated by the write_section object only correspond to the level 1 outline. The suboutline generated in the chapter content is inconsistent with the suboutline generated in storm_gen_outline.txt, which appears to serve only the function of retrieving fragments under the first level outline.

class ConvToSection(dspy.Module):
    """Use the information collected from the information-seeking conversation to write a section."""

    def __init__(self, engine: Union[dspy.dsp.LM, dspy.dsp.HFModel]):
        super().__init__()
        self.write_section = dspy.Predict(WriteSection)
        self.engine = engine

    def forward(self, topic: str, outline: str, section: str, searched_url_to_snippets: dict):
        info = ''
        for n, r in enumerate(searched_url_to_snippets.values()):
            info += f'[{n + 1}]\n' + '\n'.join(r)
            info += '\n\n'

        info = limit_word_count_preserve_newline(info, 1500)

        with dspy.settings.context(lm=self.engine):
            section = clean_up_section(
                self.write_section(topic=topic, info=info, section=section).output)

        return dspy.Prediction(section=section)

xuxiangwork avatar Apr 24 '24 09:04 xuxiangwork

Hi @xuxiangwork , thanks for reading our codebase carefully! Yes, the current StormArticleGenerationModule is implemented in a way that does not require the model to follow the subsection/subsubsection/... when writing the section. The reason we implement our NAACL'24 paper in this way is that Wikipedia editors tell us they prefer longer, more coherent paragraphs when we are conducting experiment to write Wikipedia-like article. We use very detailed hierarchical outline as a way to organize the collected information, so we can find the information out of a large pool to write a specific section rather than feeding all the collected information to the LM. The real outline for each section presented to readers can be simpler.

One core idea of the project is abstracting the knowledge curation (collect, organize information) into the pre-writing stage. This gives flexibility to customize how the curated information is presented in the final article. If you want the final writing to follow the subsection/subsubsection/... when writing the section, you only need to implement your own article writing module inheriting ArticleGenerationModule. For example, you can create

class MyWriteSection(dspy.Signature):
    """Write a Wikipedia section based on the collected information and the given outline.

        Here is the format of your writing:
            1. Use "# Title" to indicate section title, "## Title" to indicate subsection title, "### Title" to indicate subsubsection title, and so on.
            2. Use [1], [2], ..., [n] in line (for example, "The capital of the United States is Washington, D.C.[1][3]."). You DO NOT need to include a References or Sources section to list the sources at the end.
    """

    info = dspy.InputField(prefix="The collected information:\n", format=str)
    topic = dspy.InputField(prefix="The topic of the page: ", format=str)
    section = dspy.InputField(prefix="The section you need to write: ", format=str)
    section_outline = dspy.InputField(prefix="The outline for the section: ", format=str)
    output = dspy.OutputField(
        prefix="Write the section with proper inline citations (Start your writing with # section title. Don't include the page title or try to write other sections):\n",
        format=str
    )

You also need to modify some parts in the current StormArticleGenerationModule to pass the section outline into MyWriteSection.

You will find this part is fully decoupled from the previous stages because all the information is maintained in StormArticle(Article); ArticleGenerationModule just define the logic to present the information. You can actually do more advanced customization such as generating sections on demand, generating different sections with different styles, etc. within this framework.

shaoyijia avatar Apr 24 '24 20:04 shaoyijia

Hi @shaoyijia ,thank you for reply and explanation of the idea of doing so. The pre-writing stage provides a wealth of relevant snippets for writing, which is great! Thanks for your customization suggestions, too. I suspect that constraining LM to adhere to predetermined sections may be a challenge that requires some post-processing.

xuxiangwork avatar Apr 25 '24 03:04 xuxiangwork

I suspect that constraining LM to adhere to predetermined sections may be a challenge that requires some post-processing.

I agree. So the default behavior is using the outline to organize the collected information and do not require the LM to completely follow it in the article generation.

shaoyijia avatar Apr 25 '24 03:04 shaoyijia