khcoder
khcoder copied to clipboard
Treemap for cumulatively sentence/paragraph/(sub)chapter counting
Version: 3 Beta 04a OS: Windows 10 Language: German
How to create a treemap for cumulatively sentence/paragraph/(sub)chapter counting?
I have a handful of PDF texts and would like to count all words in a sentence, all sentences in a paragraph, all paragraphs in sub-chapters and all sub-chapters in a chapter cumulatively and display this in a treemap.
Is it that easy?
Somehow I have the feeling that it couldn't be that complicated with KH Coder, but even with the extensive documentation I can't put it together.
Any hints?
Some treemap examples:
Hi,
KH Coder does not have the treemap functionality. I think you can output tables from KH Coder, then use other apps to create the treemap with the tables.
Actually, I don't understand the purpose of using treemaps. Do you just want to use the size of the area to express the frequency of words? Or do you want to express some kind of structures?
Please go through the tutorial to learn how to use KH Coder. You probably have to prepare a *.txt, *.xlsx, or *.csv file to analyze text using KH Coder. https://khcoder.net/en/tutorial_slides.pdf
Some references:
Functionalities of KH Coder: https://goo.gl/photos/ixn1sTM3jm8o11bP8
It seems that you can create treemaps with R. https://www.r-graph-gallery.com/treemap.html
Thanx, for your fast answer.
Dissolving a chapter structure in a treemap and grasping it at a glance is used to recognize the quantitative weighting of a structure.
For example, I am currently investigating coalition agreements between parties to form governments. Here I would like to quickly see how many thoughts went into which topics. And since there is usually a clear structure here
(chapter > subchapter > paragraph > sentence > words)
I can imagine being able to quickly compare different contracts visually with treemaps side by side.
I have already looked at the Cran Treemap Package and will try to export data sets from khcoder. At the moment I just don't know how I can get the chapter structure from a PDF into the khcoder as automatically as possible. I will study the documentation. Perhaps somebody has solved this before?
Yeah! I have prepared a text-file (not very automatic) with
H1 < Chapter H2 < Subchapter H3 < Subsubchapter Paragraph Sentences
Now I don't really know how I can determine the number of paragraphs and sentences per (sub | subsub) chapter. A few sql tables are roughly explained, but I cannot derive the solution from them. Is there an ERM with a complete Attribute-Description? Or do you have a SQL in mind which solve this?
It is possible to share a project?
BTW: 1001 thx for this superb tool. It's a lot of fun analyse text a bit like sourcecode (SCA). https://twitter.com/DetlefBurkhardt/status/1465093870740455436?s=20
Nice!
Here are some SQL examples:
sentences per chapter:
SELECT h1_id, COUNT(*) as sentences
FROM bun
GROUP BY h1_id
sentences per sub chapter:
SELECT h1_id, h2_id, COUNT(*) as sentences
FROM bun
WHERE h2_id > 0
GROUP BY h1_id, h2_id
sentences per subsub chapter:
SELECT h1_id, h2_id, h3_id, COUNT(*) as sentences
FROM bun
WHERE h3_id > 0
GROUP BY h1_id, h2_id, h3_id
paragraphs per chapter:
SELECT h1_id, COUNT(*) as paragraphs
FROM dan
GROUP BY h1_id
paragraphs per sub chapter:
SELECT h1_id, h2_id, COUNT(*) as paragraphs
FROM dan
GROUP BY h1_id, h2_id
paragraphs per subsub chapter:
SELECT h1_id, h2_id, h3_id, COUNT(*) as paragraphs
FROM dan
GROUP BY h1_id, h2_id, h3_id
If you provide small examples of more desirable output format, I will elaborate the SQL.
If there is no copyright problem, you can zip the target file and attach it here.
Superb. I will try to understand the tables with your sql-examples.
Meanwhile here a treemap dataset-example taken from here.
group <- c(rep("group-1",4),rep("group-2",2),rep("group-3",3)) subgroup <- paste("subgroup" , c(1,2,3,4,1,2,1,2,3), sep="-") value <- c(13,5,22,12,11,7,3,1,23) data <- data.frame(group,subgroup,value)
The target file is very public. It's the coalition agreement for the next government in germany. Here a khcode optimized version: clean-ampel-2021.zip
1000thx, with your examples and some additional views & tables I could puzzle the sentence countings together. In the end I've used Excel for creating this treemap (h1/h3):
.
It was a very manual process. If I could make a requirement wish for khcoder: New Diagram: Quantitative Chapter-Treemap. Perhaps dynamic and with links direct to the chapter or topic.
And here a treemap just with only h1/h2
:
Very nice! Many thanks for sharing with us.
(Didn't know Excel can create treemaps)
I will talk about the "Ampel" Contract in germany, evt. with C3Lingo Realtime translation into english. There I will mention khcoder a lot: https://bit.ly/r3s_koala_bund_2021 29.12.2021 / 13:00 (MEZ)