radiobrowser-api icon indicating copy to clipboard operation
radiobrowser-api copied to clipboard

Normalize tag data model

Open kousu opened this issue 5 years ago • 1 comments

I took a look at the SQL dump and found that there is a TagCache table listing the tags, but that it is not related to the Stations table.

The current database has tags just stored as a string delimited by commas:

(57613, 104897, 'Radio Progreso y ERIC - El Progreso', 'http://192.99.233.46:8900/;', 'http://radioprogresohn.net/', 'http://radioprogresohn.net/media/com_zo2framework/images/favicon.ico', '2017-12-02 04:26:29', 'Honduras', 'Yoro', 'Spanish', 'derechos humanos,jesuit,el progreso', 35, 0, '24.246.81.175', '9649a4df-0601-11e8-ae97-52543be04c81','9649a4db-0601-11e8-ae97-52543be04c81')

(here the tags are ["derechos humanos", "jesuit", "el progreso"])

and there's a /cache/ which I assume gets rebuilt manually somewhere:

DROP TABLE IF EXISTS `TagCache`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `TagCache` (
  `TagName` varchar(100) COLLATE utf8_bin NOT NULL,
  `StationCount` int(11) DEFAULT '0',
  `StationCountWorking` int(11) DEFAULT '0',
  PRIMARY KEY (`TagName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
/*!40101 SET character_set_client = @saved_cs_client */;

This situation is a many-to-many relationship and it would be cleaner to use an intermediate "junction" table:

CREATE TABLE `Station` (
  `StationID` int(11) NOT NULL AUTO_INCREMENT,
  -- ...
  -- *no* `Tags` field here
  -- ...
  PRIMARY KEY (`StationID`)
)

CREATE TABLE `Tag` (
     `id` int(11) NOT NULL AUTO_INCREMENT,
     `name` varchar(100) COLLATE utf8_bin NOT NULL,
     PRIMARY KEY (`id`) 
)

CREATE TABLE `Station_Tag` (
    `StationID` int(11) NOT NULL,
    `TagID` int(11) NOT NULL
    FOREIGN KEY (StationID)
        REFERENCES Station(id),
    FOREIGN KEY (TagID)
        REFERENCES Tag(id)
)

I can't find where in the code TagCache gets populated; is there a cronjob that's not checked into the repo or am I missing something obvious? In any case, you could keep the table working by rewriting it as a view, and if performance becomes a problem you could materialize it (unfortunately with MySQL you need to write this manually). I think this should do it, but I haven't tested it, so it's just to show the idea:

CREATE VIEW TagCache AS
 SELECT
   Tag.id as TagID,
   Tag.name as name,
   count(*) as StationCount,
   count(Station.working = 1) as StationCountWorking
 FROM
 -- this inner select deals with Stations having two IDs: a UUID and a SQL ID column
 (SELECT Station.id as id, StationCheck.CheckOK as working
  FROM
   Station
  INNER JOIN
   StationCheck
  WHERE Station.StationUuid = StationCheck.StationUuid) Station
 INNER JOIN Station_Tag where Station.id = Station_Tag.StationID
 INNER JOIN Tag where Tag.id = Station_Tag.TagID
 GROUP BY TagID

I would like to write a migration to implement this change. It would make #56 easier for me.

kousu avatar May 06 '19 14:05 kousu

(this should also save a bunch of storage since it compresses tags to 4 byte integers instead of repeating their whole names in each station entry)

kousu avatar May 06 '19 15:05 kousu