llama_index
llama_index copied to clipboard
How to compare two documents?
Hi there, I am trying to compare two documents and hope to leverage gpt_index, any ideas on how to implement such functionality? thanks a lot!
by compare documents do you mean a line-by-line comparison? or comparing summaries? depending on your use case some of these functionalities may already be available in gpt index
Same here. The comparison I need are various.
For example: "Which document has more words?" "Which document has more sentences?" "Which document uses more arcaic english words?" "Which documents uses more times the sentence 'language model'?" "Which document has more references to external documents?" "Which document has more stanzas in its poems?" "Which document has more poems with rhyme scheme AABB?" "Which document has more Limericks?" "Which document has more Haikus?" "Which document makes more use of onomatopeias?" "Compare meter, rhyme scheme and stanzas forms of the two documents." "Compare the vocabulary of the two documents and use it to estimate what is the oldest and in what century it was composed." "Compare the documents and determine the one that has the greater number of occurrences of foreign words." "Compare the two novels and determine which has more different Point of Views in the narrative." "Compare the two novels and determine which has more explicit sex scenes." "Compare the two novels and determine which has more explicit scenes of violence." "Compare the two novels and determine which has more frequent use of foul language and of swear, obscene or racist expressions." "Compare the two novels and determine which has more Shakespear's quotes." "Compare the two novels and determine which has more soliloquies."
Also, it should be possible to compare more than 2 documents.
For example: "Compare all the short novels provided and determine if there are more short novels with a male protagonist or with a female protagonist." "Rank all the short novels according to the number of different occurrences of foul language and racist expressions." "Rank all the short novels according to the number of different female characters with at least one sentence of dialog."
Hi, I'm also interested by the comparison of several documents.
Document would be about the same subject but for different locations / products,...
By example, for documents about different countries of the world: "Between France and UK, which has the highest GDP?" "Which country has the highest proportion of male population?" "Which countries are presidencies, and which are kingdoms?" "What are the political differences between the USA and Russia?"
I want to also be able to get answers from a specific document, without mixing with information from other documents. By example, "What is the average height of the female population in Netherlands?" "What is the population pyramid of South Africa?"
Thanks
I would be interested in this too. Does llama_index retrieve an answer from only one document at a time?
I'd like to do something similar - I have a large book that I parsed using SimpleNodeParser, and I would like Chat GPT to review a term paper to check if it accurately discusses the book.
Hi, @SoulEvill! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you are seeking guidance on how to compare two documents using gpt_index. Several other users, including jerryjliu, Emasoft, iraadit, robertsilen, and DrShrinker, have also expressed interest in similar functionality and have provided examples of the types of comparisons they would like to make. However, no solution has been provided yet.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LlamaIndex project!