community-plugins icon indicating copy to clipboard operation
community-plugins copied to clipboard

fix(copilot) : team_name changed to empty string instead of null

Open henrikedegrd opened this issue 9 months ago • 5 comments

Hey, I just made a Pull Request!

This changes the team_name column from null to empty string. This is because PostgreSQL cant handle unique constraints with null. So in some cases, there can be multiple lines of the same day where team_name is null. This also makes the statistics display strange numbers because of joins.

The migration file does this (for all affected tables):

  1. Checks for all duplicate rows
  2. If found, delete them all (and keep information of the row)
  3. Insert a new row with the values, and an empty string instead of null for team_name
  4. Update all rows that have null in team_name to be empty strings (non duplicated rows)
  5. It changes the column team_name from null to not_null.

All queries that checked for null have also been updated to check for empty string instead.

:heavy_check_mark: Checklist

  • [x] A changeset describing the change and affected packages. (more info)
  • [ ] Added or updated documentation
  • [x] Tests for new functionality and regression tests for bug fixes
  • [ ] Screenshots attached (for UI changes)
  • [x] All your commits have a Signed-off-by line in the message. (more info)

henrikedegrd avatar Mar 24 '25 09:03 henrikedegrd

Changed Packages

Package Name Package Path Changeset Bump Current Version
@backstage-community/plugin-copilot-backend workspaces/copilot/plugins/copilot-backend patch v0.15.0

backstage-goalie[bot] avatar Mar 24 '25 09:03 backstage-goalie[bot]

Any updates on this @vinzscam?

Anything I can do to move this forward?

henrikedegrd avatar Apr 07 '25 06:04 henrikedegrd

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

github-actions[bot] avatar Apr 22 '25 18:04 github-actions[bot]

@vinzscam I replied to your comment, can you check it again?

henrikedegrd avatar Apr 22 '25 18:04 henrikedegrd

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

github-actions[bot] avatar May 20 '25 12:05 github-actions[bot]

Dismiss stale ☺️ 🙏🏼

Sacquer avatar May 26 '25 07:05 Sacquer

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

github-actions[bot] avatar Jun 09 '25 12:06 github-actions[bot]

As discussed in the SIG today, reopening this one to see if we can continue the work and find a way forwards!

Parsifal-M avatar Sep 09 '25 14:09 Parsifal-M

To further clarify my standpoint;

  1. In order to make the unique-constraint work, the null needs to be changed to empty string.
  2. We then need to change the select-statements to query for '' instead of null.
  3. For the select to work, we need to update all rows from null -> ''
  4. If there are duplicate rows, that update will fail (the constraint kicks in) and we are back to square one.

So, from my point and to reduce complex logic, we need to clean the data first. I am not a fan for making huge logic for handling the duplicate rows in code. It might be doable in SQL, but that is outside my knowledge. And since there are a number of joins and SUM(), we need to only join on one row, otherwise the SUM() will multiply exponentially by each duplicated row.

Yes, the migration-file is pretty big and a bit complex, but it does what it is suppose to do. I have runned the migration more than once on our duplicate rows, and have not encountered any problems.

I can probably improve it to maybe make a copy of the table before cleaning is done. And clean the copied table, and if success, rename the tables.

Would that be more satisfying to everyone?

However since building tests towards a database is somewhat complex, there are no tests for this. Any tips on how to do that would be nice also.

henrikedegrd avatar Sep 10 '25 06:09 henrikedegrd

To further clarify my standpoint;

  1. In order to make the unique-constraint work, the null needs to be changed to empty string.
  2. We then need to change the select-statements to query for '' instead of null.
  3. For the select to work, we need to update all rows from null -> ''
  4. If there are duplicate rows, that update will fail (the constraint kicks in) and we are back to square one.

So, from my point and to reduce complex logic, we need to clean the data first. I am not a fan for making huge logic for handling the duplicate rows in code. It might be doable in SQL, but that is outside my knowledge. And since there are a number of joins and SUM(), we need to only join on one row, otherwise the SUM() will multiply exponentially by each duplicated row.

Yes, the migration-file is pretty big and a bit complex, but it does what it is suppose to do. I have runned the migration more than once on our duplicate rows, and have not encountered any problems.

I can probably improve it to maybe make a copy of the table before cleaning is done. And clean the copied table, and if success, rename the tables.

Would that be more satisfying to everyone?

However since building tests towards a database is somewhat complex, there are no tests for this. Any tips on how to do that would be nice also.

Correct me if I'm wrong. As I understand it, if we start by making changes to the DatabaseHandle and metricsHelper while skipping the migration entirely, this means that any new data collected after the fix is applied will be correct. However, if I select a date prior to the fix, the data might be incorrect. Is that right?

vinzscam avatar Sep 17 '25 15:09 vinzscam

Correct me if I'm wrong. As I understand it, if we start by making changes to the DatabaseHandle and metricsHelper while skipping the migration entirely, this means that any new data collected after the fix is applied will be correct. However, if I select a date prior to the fix, the data might be incorrect. Is that right?

It depends on which changes are made to the SELECT queries. They need to handle both null OR '' in that case.

But yes, if new inserts are made with '', I believe that new data will be OK.

I think some changes also needs to be made for the "check if data exists", in order for that to handle both null and ''.

I however think that this is a bad workaround. The columns should be set to be notNullable, and with that we get clean code.

I am more interested in why the migration is bad. Yes, it is much code in terms of lines, but mostly just copy paste since there are many tables affected.

henrikedegrd avatar Sep 18 '25 07:09 henrikedegrd

Let me just check one thing.

The migrations intentionally destroy data, right? They find collections of rows where there are duplicates and destroys all but one of those rows. It doesn't try to sum up or do any arithmetics.

If we assume for now that this is OK, it sounds like a DELETE...USING (as raw sql, don't try to express it with knex builders) will solve the problem easier, at least if only postgres is the target.

WITH cte AS (
  SELECT ctid, tableoid, row_number() OVER (PARTITION BY c1, c2, c3) AS rn, *
  FROM the_table_name
)
DELETE FROM the_table_name USING cte
WHERE (the_table_name.ctid, the_table_name.tableoid) = (cte.ctid, cte.tableoid) AND cte.rn > 1

This lets you skip a ton of the repeated stuff, and only use raw SQL logic for removing all but the first row matching each c1, c2, c3. And it works automagically, whether there are duplicates or not. Note the > 1.

This is just me dumping an idea. I haven't yet checked whether we should do it at all :)

But if I understand correctly how these indices are used, it's not sufficient to have a { predicate: knex.whereNotNull('team_name') } on the index right? As in, you want to get rid of duplicates within the set of null team name entries, is that correct?

freben avatar Sep 19 '25 15:09 freben

Let me just check one thing.

The migrations intentionally destroy data, right? They find collections of rows where there are duplicates and destroys all but one of those rows. It doesn't try to sum up or do any arithmetics.

If we assume for now that this is OK, it sounds like a DELETE...USING (as raw sql, don't try to express it with knex builders) will solve the problem easier, at least if only postgres is the target.

WITH cte AS (
  SELECT ctid, tableoid, row_number() OVER (PARTITION BY c1, c2, c3) AS rn, *
  FROM the_table_name
)
DELETE FROM the_table_name USING cte
WHERE (the_table_name.ctid, the_table_name.tableoid) = (cte.ctid, cte.tableoid) AND cte.rn > 1

This lets you skip a ton of the repeated stuff, and only use raw SQL logic for removing all but the first row matching each c1, c2, c3. And it works automagically, whether there are duplicates or not. Note the > 1.

This is just me dumping an idea. I haven't yet checked whether we should do it at all :)

But if I understand correctly how these indices are used, it's not sufficient to have a { predicate: knex.whereNotNull('team_name') } on the index right? As in, you want to get rid of duplicates within the set of null team name entries, is that correct?

Good ideas!

Yes, the migration destroys data (deleting). And it only deletes row that are duplicate when team_name is null together with the other columns of the index for that specific table. The indexes are a bit different on the tables, in some cases they are made up of 3 columns, sometimes more.

And yes, I want to get rid of duplicates only when team_name is null, since that is what causes the index not to work.

I also think that I know why it happens in the first place. Since it only happens when running multiple replicas. If the time on the hosts are off by a few seconds and the task finishes so quickly, that the other replica picks up the task also. We had this happen on another task we have, so we had to set a timeout in order for the task to take more than 0.5s.

When this happens, the inserts where team_name is null isnt handled correctly, so instead of hitting the index and then onConflict().ignore(). It inserts a new row. And it can insert unlimited duplicate rows when team_name is null. (but only in the case of multiple replicas, since the initial check if any new data shall be inserted is based on a date.

My SQL-skills are not the best, so if the final result can be achieved by using only one SQL-statement that would be awesome!

And if your example does what I want, that is great! I tried to find some sql-logic that does exactly that (remove all but one row, when the "select" finds duplicates). But didnt really find a good solution. So that is why I ended up first deleting all rows, and then inserting one new row with the values that were duplicated.

The problem only occurs when using PostgreSQL, since the current unique-index with null works fine with Sqlite.

However, I dont know how the migration works if some parts are PostgreSql-specific queries, and how sqlite handles it. (From what I remember, I think I used CoPilot to come up with a sql-solution that works with both postgre and sqlite.

I will have a go with finding some nicer SQL with your suggestion that does what I want next week. And try it on a copy of our data (since we still have duplicates in our data-set).

henrikedegrd avatar Sep 20 '25 10:09 henrikedegrd

Hi @henrikedegrd, sorry for jumping in the middle here, just wanted to share an example of a migration that only runs for Postgres as I had to do this with the Linguist plugin ages ago:

https://github.com/backstage/community-plugins/blob/8db5951e5f0683bd571ba8d69e362639f83c7763/workspaces/linguist/plugins/linguist-backend/migrations/20230216_remove_processed_date_default.js#L22-L26

Part of the reason you won't have issues with a default setup of SQLite is that it's all in memory and the DB gets rebuilt from scratch each time it starts.

awanlin avatar Sep 20 '25 14:09 awanlin

Hi again.

I have now done some changes and local testing to better handle the DELETE with some pointers from @freben. However after some tests and dialog with CoPilot I ended up with a slightly different SQL that works with the problematic NULL, but still does "delete all but one row" based on the unique index sequence.

I did run it both manually and in the migration upon starting the plugin, and they both work just fine.

And with the pointer from @awanlin, I added so the deletion only runs when not using sqlite3 (since they are not affected by this problem). The UPDATES and ALTERS still runs on all, since they are needed.

Yes, the migration file is still large (in terms of LoC). But since there are 14 tables affected, there need to be 14 DELETE, 14 UPDATE and 14 ALTER. And the indexes are different in each table, so I think its nicer to have them as they are now, instead of trying to minimise LoC.

henrikedegrd avatar Sep 22 '25 08:09 henrikedegrd

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

github-actions[bot] avatar Oct 21 '25 12:10 github-actions[bot]

... 2 more weeks

henrikedegrd avatar Oct 21 '25 18:10 henrikedegrd

And another 2 weeks.

Why should I not just close this PR again?

There were requests that this should be opened again to solve an issue that people has posted about. And then I took my spare time to fix it.

But the review takes months...

henrikedegrd avatar Nov 04 '25 09:11 henrikedegrd