hw-probe icon indicating copy to clipboard operation
hw-probe copied to clipboard

add support for not uploading hashed MAC addresses and serial numbers

Open pabs3 opened this issue 5 years ago • 10 comments

I think you could widen the audience for this tool if you were to add support for not uploading hashed MAC addresses and serial numbers.

Folks who are a bit paranoid would be more likely to submit hardware probes when those sort of details are not uploaded without their consent and the benefits of uploading them are clearly explained.

I think I would design the support a private mode like this:

Add command-line options for enabling/disabling uploading salted hashed device identifiers.

When one of these options is not passed to hw-probe, print enough information (including other considerations and the benefits and risks of adding the salted hashed device identifiers) so they can give informed consent if desired, in both layperson language and in technical language (mention the hash used etc) and then ask them for consent, without allowing a default answer, so that the decision is entirely in the user's hands.

Also include the exact same information in the --help output and in the manual page privacy section, with a note about the privacy section next to the options for enabling/disabling uploading salted hashed device identifiers.

PS: the manual page is currently missing the information that a salt is applied before hashing.

This isn't going to widen the audience for this tool to the more paranoid people, but it could at least widen it to the slightly paranoid people. The amount of slightly paranoid people is expanding every year because of all the news stories about companies getting hacked or leaking or selling all of their user data.

pabs3 avatar Jan 08 '20 23:01 pabs3

Fixed by commit a9c367c0b5c1012839446b6e6486039a0db25f29.

Note that just 32-byte prefix of salted sha512 hash is uploaded. No way it can be decrypted even if the salt or hashing algorithm will be compromised.

I think enabling/disabling of uploading of hashed ids is not needed in this case. Please let me know your opinion.

Thanks.

linuxhw avatar Jan 09 '20 05:01 linuxhw

I agree a short prefix of a salted hash is unlikely to be reversed.

This issue isn't about any practical attacks on the gathered data, but about moving sensitive data gathering behind an informed consent barrier to give control over the data gathering to the user and to prevent giving the perception of being a privacy violating database, even if there is no possible privacy violation.

In other words, this is mainly about the perception of potential users than about any practical implications of the data gathering.

BTW: I suggest doing logical commits instead of combining multiple different changes into one commit like you did in commit a9c367c.

-- bye, pabs

https://wiki.debian.org/PaulWise

pabs3 avatar Jan 09 '20 06:01 pabs3

Will it be enough to add -confirm-upload-of-encrypted-ids alias for -upload option? So that we'll suggest to use the following command:

[since hw-probe 1.5] sudo -E hw-probe -all -confirm-upload-of-encrypted-ids

instead of:

sudo -E hw-probe -all -upload

It looks impossible to upload data w/o (encrypted) ids and match it with old probes of the same computer to avoid nodes/devices duplication in the db.

linuxhw avatar Jan 09 '20 08:01 linuxhw

I don't think that MAC addresses and serial numbers are going to fully prevent duplication in the db, since MAC addresses can be randomised and disks are often replaced after failure.

I don't think that preventing duplication is necessarily desirable, since people can plug in new USB devices or PCI devices, change their GPUs around, upgrade their Linux kernel etc. Having those additional probes in the database would be useful.

So I would instead use the hash of the uploaded data (not including the MAC addresses and serial numbers) to prevent double uploads of the same report, keep all the almost-duplicate reports and remove uploading of truncated salted hashed device identifiers.

The existing truncated salted hashed device identifiers stuff doesn't seem to help reducing duplication anyway, for example here are two probes of clearly the same computer but with different amounts of IRQ events. Probably the number of IRQ events needs to be stripped btw.

https://linux-hardware.org/index.php?computer=82fb5121830a https://linux-hardware.org/index.php?probe=b20e10b246&log=hwinfo https://linux-hardware.org/index.php?probe=deb80295cf&log=hwinfo

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 avatar Jan 09 '20 08:01 pabs3

I don't think that MAC addresses and serial numbers are going to fully prevent duplication in the db, since MAC addresses can be randomised and disks are often replaced after failure.

We have ~0.1% of dups in the database currently (due to randomization, low cost network cards with duplicated MACs and migration of network cards from one board to another). But all such cases are solved successfully by adding board and CPU model name to the hash.

I don't think that preventing duplication is necessarily desirable, since people can plug in new USB devices or PCI devices, change their GPUs around, upgrade their Linux kernel etc. Having those additional probes in the database would be useful.

Replace of any device on board or upgrade of the kernel doesn't change the computer ID currently. Computer ID is a hash of integrated Ethernet controller on the board.

So I would instead use the hash of the uploaded data (not including the MAC addresses and serial numbers) to prevent double uploads of the same report, keep all the almost-duplicate reports and remove uploading of truncated salted hashed device identifiers.

This will merge probes of different instances of same computer model to one entity in the database. Also hashes of drive IDs are necessary to continue this study https://github.com/linuxhw/SMART.

The existing truncated salted hashed device identifiers stuff doesn't seem to help reducing duplication anyway, for example here are two probes of clearly the same computer but with different amounts of IRQ events. Probably the number of IRQ events needs to be stripped btw.

Duplication of probes of the same computer is a different task. Currently de-duplication is time-based, e.g. removing all probes of the same computer created within a short time interval.

I'm about to release 1.5 soon. Do you agree to add -confirm-upload-of-encrypted-ids option as a temp solution?

linuxhw avatar Jan 10 '20 09:01 linuxhw

Wouldn't is be still useful to let the user disable such upload of ids?

ConiKost avatar Jan 10 '20 13:01 ConiKost

I don't think that the -confirm-upload-of-encrypted-ids option you propose is the right solution here.

Also -confirm-upload-of-encrypted-ids is incorrectly named, the IDs aren't encrypted, they are hashed.

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 avatar Jan 11 '20 01:01 pabs3

@ConiKost

Wouldn't is be still useful to let the user disable such upload of ids?

This will merge probes of different instances of the same computer model into one entry in the database. So it's not possible currently until we invent replacement for identification by hashed MAC address.

linuxhw avatar Jan 11 '20 04:01 linuxhw

At least, it could be a wise idea, if a user uses -upload, a notice is shown, that this data is identified by hashed mac and he has like 10 seconds to abort. If he uses directly -confirm-upload-of-hashed-ids or something like that, it goes directly uploaded without warning.

ConiKost avatar Jan 11 '20 13:01 ConiKost

@ConiKost,

At least, it could be a wise idea, if a user uses -upload, a notice is shown, that this data is identified by hashed mac and he has like 10 seconds to abort. If he uses directly -confirm-upload-of-hashed-ids or something like that, it goes directly uploaded without warning.

Probably adding a note about this everywhere near the -upload option description will be enough.

Also new probes are not visible immediately in the database. Usually new probes are approved 2-3 times a week. One can write an email to [email protected] to remove the probe permanently from the database by its ID before the approval.

linuxhw avatar Jan 11 '20 20:01 linuxhw