SORMAS-Project
SORMAS-Project copied to clipboard
Replace UUID string with a RFC compliant standard form UUID
Feature Description
SORMAS generates v4 UUIDs, and stores them as a Base32-encoded string (e.g. TFRGBU-UVB25S-VSUAFH-QLYFKKOA). The first 6 characters are used as a short ID in tables and other overviews. The reasoning behind this was to shorten the uuid a bit and to have a break after 6 characters that is easy to identify and allows us to use the first 6 characters as a short id.
Problem Description
Not using the standard form (e.g. a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11) makes it more difficult to interoperate with third party systems. In addition we can't use the UUID database and Java type, which would be much more convenient (see https://www.postgresql.org/docs/current/datatype-uuid.html). Also the current base32 implementation is not RFC compliant. See #6706
Proposed Change
- [ ] In the database rename all uuid columns to legacyUuid. Remove the not null constraint
- [ ] In the database create a new unique uuid column for each table, using the Postgres UUID type.
- [ ] Fill the new uuid column by creating a v3 uuid using the legacyUuid as input. I suggest to use the following method that mimics what java.util.nameUUIDFromBytes does: https://stackoverflow.com/a/55049994
- [ ] Use the same logic to create the uuid for "constant" uuids as used for "other health facility" and similar
- [ ] Change the data type of the uuids in all Java classes to java.util.UUID
- [ ] Introduce a new field "legacyUuid" for all entities (
AbstractDomainObject
) - [ ] When new entites are created no longer fill the legacyUUID
- [ ] Apply the same changes to the android app and it's backend
- [ ] Make sure the open API swagger created uses
type: string
in combination withformat: uuid
for the new uuid - [ ] When an entity is opened based on an url and no result is found also check the legacy URL
To be refined
- Is there a need to display the legacy UUID to users for a limited period of time (e.g. three releases or via configuration)?
- How to handle the UUID change for systems that are connected to other surveillance tools (e.g. SurvNet)?
- Talk with representative systems that send data to SORMAS and rely on the UUID: Climedo, others?
- Remark by @JonasCir : I would propose v5 over v3. md5 is a legacy algorithm and SHA based algorithms should always be preferred. Also I implemented an algorithm to generate v5 UUID from exisitng UUIDs in python, so I guess it makes sense to look at this as well as an example.
Additional information
A url-safe short UUID (e.g. based on flickr58 alphabet - see https://www.npmjs.com/package/short-uuid) would be handy at some point, but is probably not urgently needed and thus not part of this issue.
@leventegal-she Can you check whether is any functionality related to S2S that needs to be taken into account for this issue? @JonasCir Please also have another look at the refined issue.
@JaquM-HZI This issue will change the UUID of cases and all other entities. The old UUID will be kept as legacy UUID.
Is there a need to display the legacy UUID to users for a transition period (e.g. three releases or via configuration)?
@MartinWahnschaffe yes it is necessary to be able to see them or have them in an export, cause sometimes it is used for official documents etc. Can't say for how long, but would say at least 6 months, so please make it available for this time minimum.
@MartinWahnschaffe thank you very much for pushing this. Issue looks good to me, I added my thoughts to the refine section.
@JonasCir Can you provide details on what needs to be done for the central etcd process when the uuid is changed? I guess all infrastructure data on the etcd has to be updated at the same time the new version is rolled out. This means the version has to be rolled out to all servers using the etcd at the same time, right?
Good catch @MartinWahnschaffe! I was thinking ahead and implemented central in such a way that it only distributes valid UUIDv5. See here for an example of how existing data can be fed into uuidv5
to generate a stable identifier.
@JonasCir Hm, unfortunately this makes the whole thing more complicated. It means that we have to check whether a legacy UUID string already is a valid UUID and need to keep it in this case. The same thing has to be done by external systems that rely on the SORMAS UUIDs.
@MartinWahnschaffe and I discussed the last point: infrastructure data has an centrallyManaged
flag. If it is set, the UUID is valid and does not need to be recomputed.