khcoder icon indicating copy to clipboard operation
khcoder copied to clipboard

Treemap for cumulatively sentence/paragraph/(sub)chapter counting

Open CodeFreezr opened this issue 3 years ago • 10 comments

Version: 3 Beta 04a OS: Windows 10 Language: German

How to create a treemap for cumulatively sentence/paragraph/(sub)chapter counting?

I have a handful of PDF texts and would like to count all words in a sentence, all sentences in a paragraph, all paragraphs in sub-chapters and all sub-chapters in a chapter cumulatively and display this in a treemap.

Is it that easy?

Somehow I have the feeling that it couldn't be that complicated with KH Coder, but even with the extensive documentation I can't put it together.

Any hints?

CodeFreezr avatar Nov 27 '21 13:11 CodeFreezr

Some treemap examples:

CodeFreezr avatar Nov 27 '21 13:11 CodeFreezr

Hi,

KH Coder does not have the treemap functionality. I think you can output tables from KH Coder, then use other apps to create the treemap with the tables.

Actually, I don't understand the purpose of using treemaps. Do you just want to use the size of the area to express the frequency of words? Or do you want to express some kind of structures?

Please go through the tutorial to learn how to use KH Coder. You probably have to prepare a *.txt, *.xlsx, or *.csv file to analyze text using KH Coder. https://khcoder.net/en/tutorial_slides.pdf


Some references:

Functionalities of KH Coder: https://goo.gl/photos/ixn1sTM3jm8o11bP8

It seems that you can create treemaps with R. https://www.r-graph-gallery.com/treemap.html

ko-ichi-h avatar Nov 27 '21 16:11 ko-ichi-h

Thanx, for your fast answer.

Dissolving a chapter structure in a treemap and grasping it at a glance is used to recognize the quantitative weighting of a structure.

For example, I am currently investigating coalition agreements between parties to form governments. Here I would like to quickly see how many thoughts went into which topics. And since there is usually a clear structure here

(chapter > subchapter > paragraph > sentence > words)

I can imagine being able to quickly compare different contracts visually with treemaps side by side.

I have already looked at the Cran Treemap Package and will try to export data sets from khcoder. At the moment I just don't know how I can get the chapter structure from a PDF into the khcoder as automatically as possible. I will study the documentation. Perhaps somebody has solved this before?

CodeFreezr avatar Nov 27 '21 19:11 CodeFreezr

Yeah! I have prepared a text-file (not very automatic) with

H1 < Chapter H2 < Subchapter H3 < Subsubchapter Paragraph Sentences

image

Now I don't really know how I can determine the number of paragraphs and sentences per (sub | subsub) chapter. A few sql tables are roughly explained, but I cannot derive the solution from them. Is there an ERM with a complete Attribute-Description? Or do you have a SQL in mind which solve this?

It is possible to share a project?

BTW: 1001 thx for this superb tool. It's a lot of fun analyse text a bit like sourcecode (SCA). https://twitter.com/DetlefBurkhardt/status/1465093870740455436?s=20

CodeFreezr avatar Nov 28 '21 22:11 CodeFreezr

Nice!

Here are some SQL examples:

sentences per chapter:

SELECT h1_id, COUNT(*) as sentences
FROM bun
GROUP BY h1_id

sentences per sub chapter:

SELECT h1_id, h2_id, COUNT(*) as sentences
FROM bun
WHERE h2_id > 0
GROUP BY h1_id, h2_id

sentences per subsub chapter:

SELECT h1_id, h2_id, h3_id, COUNT(*) as sentences
FROM bun
WHERE h3_id > 0
GROUP BY h1_id, h2_id, h3_id

paragraphs per chapter:

SELECT h1_id, COUNT(*) as paragraphs
FROM dan
GROUP BY h1_id

paragraphs per sub chapter:

SELECT h1_id, h2_id, COUNT(*) as paragraphs
FROM dan
GROUP BY h1_id, h2_id

paragraphs per subsub chapter:

SELECT h1_id, h2_id, h3_id, COUNT(*) as paragraphs
FROM dan
GROUP BY h1_id, h2_id, h3_id

If you provide small examples of more desirable output format, I will elaborate the SQL.

If there is no copyright problem, you can zip the target file and attach it here.

ko-ichi-h avatar Nov 29 '21 02:11 ko-ichi-h

Superb. I will try to understand the tables with your sql-examples.

Meanwhile here a treemap dataset-example taken from here.

group <- c(rep("group-1",4),rep("group-2",2),rep("group-3",3)) subgroup <- paste("subgroup" , c(1,2,3,4,1,2,1,2,3), sep="-") value <- c(13,5,22,12,11,7,3,1,23) data <- data.frame(group,subgroup,value)

The target file is very public. It's the coalition agreement for the next government in germany. Here a khcode optimized version: clean-ampel-2021.zip

CodeFreezr avatar Nov 29 '21 21:11 CodeFreezr

1000thx, with your examples and some additional views & tables I could puzzle the sentence countings together. In the end I've used Excel for creating this treemap (h1/h3):

clean .

It was a very manual process. If I could make a requirement wish for khcoder: New Diagram: Quantitative Chapter-Treemap. Perhaps dynamic and with links direct to the chapter or topic.

CodeFreezr avatar Dec 06 '21 18:12 CodeFreezr

And here a treemap just with only h1/h2 KoalaBund2021_Abschnitte :

CodeFreezr avatar Dec 08 '21 01:12 CodeFreezr

Very nice! Many thanks for sharing with us.

(Didn't know Excel can create treemaps)

ko-ichi-h avatar Dec 08 '21 10:12 ko-ichi-h

I will talk about the "Ampel" Contract in germany, evt. with C3Lingo Realtime translation into english. There I will mention khcoder a lot: https://bit.ly/r3s_koala_bund_2021 29.12.2021 / 13:00 (MEZ)

CodeFreezr avatar Dec 24 '21 21:12 CodeFreezr