conquery Implement MatchingStats for SQL mode

Extracted an Interface for MatchingStats, copied the existing implementation to WorkerMatchingStats and added new SqlMatchingStats
Added a dedicated SqlUpdateMatchingStatsJob which walks the concept tree and collects the respective stats
Added the possibility to define a primary column per Table

For a single TreeConcept, we do:

a count(*) on each connector, union these tables and finally sum() the count per connector to obtain the concepts event count
select the PIDs for each connector, union these tables and do a countDistinct(pid) to obtain the concepts entity count
for each connector and each of it's validity dates, we select the start and end, union all these tables and select the min(start) and max(end) to obtain the concepts date span

Jan 25 '24 11:01 jnsrnhld

@awildturtok Jetzt mit Parallelisierung: Pro ConceptTreeElement wird ein Runnable erstellt, was dann die 3 Queries für die MatchingStats ausführt und Werte setzt.

Feb 26 '24 09:02 jnsrnhld

Das generierte SQL im postgres modus ist lieder falsch:

postgres=# select count(distinct null) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;
 primary_column
----------------
              0
(1 row)


postgres=# select count(distinct id) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;

 primary_column
----------------
           1507

oberes ist das von uns erzeugte.

Zusätzlich ist mir aufgefallen, dass wir die Verundung mit den Elternknoten hier noch nicht umgesetzt haben.

Mar 04 '24 13:03 awildturtok

Performance Metriken zu bekommen ist relativ schwierig, die Query Ausführungszeiten variieren stark nach load und sind somit nicht besonders aussagekräftig. (habe hier was 8s vs 800ms braucht je nach Load)

Mar 04 '24 13:03 awildturtok

Das generierte SQL im postgres modus ist lieder falsch:
postgres=# select count(distinct null) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;
 primary_column
----------------
              0
(1 row)


postgres=# select count(distinct id) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;

 primary_column
----------------
           1507
oberes ist das von uns erzeugte.

Zusätzlich ist mir aufgefallen, dass wir die Verundung mit den Elternknoten hier noch nicht umgesetzt haben.

Hey, das liegt daran dass weder an der Table, noch in der SQL Config die primary column gesetzt ist. Dadurch kennt der Job die PID nicht.

Mar 06 '24 08:03 jnsrnhld

@jnsrnhld die Primary Column ist gesetzt in der config.json

Mar 06 '24 09:03 awildturtok

Ok, das ist strange. Ich schau mir nochmal an woran es liegen könnte.

Mar 06 '24 09:03 jnsrnhld

@awildturtok Spricht aus deiner Sicht was dagegen den PR hier zu mergen, auch wenn das Performance-Thema noch nicht angegangen wurde? Für Demos usw. ist es ganz nice, wenn die MatchingStats funktionieren, auch wenn noch nicht performant.

Jun 13 '24 15:06 jnsrnhld

Ah sorry ich wusste nicht, dass es noch nicht gemerged ist. Können wir da am Dienstag nochmal drüber sprechen. Ich hatte eine Idee wie man das evtl umsetzen könnte mit case-when

Jun 13 '24 15:06 awildturtok

conquery conquery copied to clipboard

Implement MatchingStats for SQL mode

For a single TreeConcept, we do:

conquery
conquery copied to clipboard