netbox icon indicating copy to clipboard operation
netbox copied to clipboard

The oldest config revision is activated when replicating NetBox

Open markkuleinio opened this issue 1 month ago • 2 comments

NetBox Edition

NetBox Community

NetBox Version

v4.4.6

Python Version

3.11

Steps to Reproduce

  1. Install new NetBox 4.3.7 ** (git checkout v4.3.7) server as per official instructions but don't run upgrade.sh yet
  2. pg_dump a NetBox 3.7.3 database from an existing NetBox server and import it on the new server
  3. Run upgrade.sh
  4. See that the core_configrevision table does not yet have active column
  5. git checkout v4.4.6 and run upgrade.sh
  6. See that there is a new active column in the core_configrevision table
  7. See that the oldest configrevision has active = true

(** In some experiences, including my own, it is not possible to upgrade a NetBox 3.7.3 database directly to 4.4.x, thus the intermediate upgrade step via 4.3.7.)

Expected Behavior

The newest configuration revision is active when NetBox database is replicated from old version to new version.

Observed Behavior

The oldest configuration revision is active.

This is a major problem because the NetBox background task starts immediately (after starting the new NetBox services) deleting old changelog entries according to the retention time configured in the active config revision. Thus, if/when the oldest config revision has retention time shorter than the latest config revision, old changelogs are unintentionally lost.

Workaround attempt: I set the newest config revision as active=true (required also setting the oldest revision to active=false before that) in the database before starting NetBox services the first time. While it looks like the newest config revision is now active in the UI, the background task still executed the cleanup task with the old settings:

2025-11-18 15:27:48,165 netbox.jobs.SystemHousekeepingJob INFO: Pruning old changelog entries...
2025-11-18 15:27:48,168 netbox.jobs.SystemHousekeepingJob DEBUG: Changelog retention period: 1825 days (2020-11-19 13:27:48)
2025-11-18 15:27:50,020 netbox.jobs.SystemHousekeepingJob INFO: Deleted 221 expired changelog records

(1825 days was set in the oldest config revision, while the newest (now active) revision has 0)

While testing this workaround I observed that some data in Redis also affected this: When I had manually activated the latest config revision in the UI and then deleted, imported and upgraded the database again, the problem did not occur, meaning that the latest config revision was still active (INFO: Pruning old changelog entries..., INFO: No retention period specified; skipping.). But, when I also reinstalled redis-server (when importing the database again), the error occurred. Thus, apparently in-place upgrades work correctly, but not migrations/replications to new servers. Maybe this would be a useful workaround as well: https://stackoverflow.com/questions/6004915/how-do-i-move-a-redis-database-from-one-server-to-another .

Or maybe the workaround currently is to first execute the migration+upgrade steps and activate the latest revision (which deleted unintended changelogs), and then do the steps again on the same server so that the data in Redis database takes care of not using the oldest revision at all.

Another workaround is probably removing all config revisions but the latest one (to prevent the cleanup job runaway with incorrect settings), but that's not ideal as all the config revision history is then lost in the UI.

markkuleinio avatar Nov 18 '25 13:11 markkuleinio

Duh, the cleanest workaround for the changelog deletion problem is setting CHANGELOG_RETENTION = 0 in configuration.py, just like in the old days.

https://netboxlabs.com/docs/netbox/configuration/miscellaneous/#changelog_retention

2025-11-18 17:39:56,287 netbox.jobs.SystemHousekeepingJob INFO: Pruning old changelog entries...
2025-11-18 17:39:56,289 netbox.jobs.SystemHousekeepingJob INFO: No retention period specified; skipping.

Then I can restore the latest revision in the UI, and finally remove the setting in configuration.py.

markkuleinio avatar Nov 18 '25 15:11 markkuleinio

I had similar things when testing smth in dev. I copy everything from prod containers to dev containers usually to have an option to play safe and don`t care if i destroy everything.

Some day i noticed that after such actions "Configuration history" doesn`t have any config revisions /core/config-revisions/add/ (same as in prod, empty), but i clearly see "Top banner" with words "This is dev instance". I should not see it because i have full copy of prod in dev where should not be such words in Top banner.

However, i know that weeks ago i made test config where i changed top banner to "this is dev instance". The solution for my case was to completely delete Redis docker volumes and recreate again, i know that it`s possible to flush some parts of data in Redis, not sure that in general this is correct way, but for testing seems fine.

simonzsay avatar Nov 24 '25 14:11 simonzsay

I am not able to reproduce this behavior on a fresh v3.7 database with some manually created ConfigRevisions:

Image

However this is without any selected config revision in my Redis cache. The migration 0019_configrevision_active.py has this logic:

https://github.com/netbox-community/netbox/blob/5a24f99c9de22932ed74ee349bc7c1a24cc120a9/netbox/core/migrations/0019_configrevision_active.py#L12-L27

So if you have a config_version in your cache whose value matches the PK of an existing ConfigRevision in your database, that's the version that will be marked active; otherwise, it picks the most recent version.

So the real workaround ought to be to make sure your cache is cleared on the target installation.

Because this and other workarounds exist, unless there are strenuous objections I'm going to say that this appears to be working as intended. (I'm not sure what kind of fix would even make sense, unless anyone can spot a fault in the logic of the above code.)

bctiemann avatar Dec 18 '25 18:12 bctiemann

I started with an empty server (Debian 12) so I'd say the cache was empty as well. The problem also specifically reappeared after redis-server (and it's data) was removed and reinstalled.

Since opening the issue, I was able to successfully migrate the NetBox instances using the changelog retention setting workaround, so I don't have current issue with this. Also based on your comments I'll close this issue. Feel free to reopen if needed.

Thanks for testing.

markkuleinio avatar Dec 18 '25 18:12 markkuleinio