haxelib Stop using MD5 for passwords

quoting @ibilon from https://github.com/HaxeFoundation/haxe.org/issues/177:

we also need to upgrade the usage of md5 for haxelib password, it's been broken for so many years, but that's not a server issue.

Apr 05 '16 14:04 nadako

To do that we need the password reset functionality, then we could empty the password column and force a password reset for everybody.

Apr 05 '16 14:04 ibilon

What should we use instead anyway? haxe.crypto contains Sha1, Sha224 and Sha256

Apr 05 '16 15:04 nadako

Sha256 is still safe so it should be good, and with the random salt we shouldn't see a crack any time soon.

Apr 05 '16 15:04 ibilon

Looks like sha256 + salt is not good enough for passwords => http://security.stackexchange.com/questions/90064/how-secure-are-sha256-salt-hashes-for-password-storage?lq=1

According to another SO question, we probably want to use PBKDF2-Haxe.

Apr 05 '16 16:04 andyli

I knew about PBKDF2 but I assumed it wasn't available in haxe, in that case yeah we should go with that one.

Apr 05 '16 16:04 ibilon

It would be good to come back to this, MD5 really shouldn't be used anymore. This should be considered a major security concern for the Haxe ecosystem, especially as it is growing.

May 27 '22 15:05 tobil4sk

Instead of forcing everyone to reset their passwords, we could temporarily rehash the existing passwords with a new algorithm and then update them properly when the user next logs in: https://paragonie.com/blog/2016/02/how-safely-store-password-in-2016#legacy-hashes

There is however a problem with how to handle old clients... right now, the client hashes the password with md5 before sending it off to the server, instead of the password being hashed on the server side. We might have to simply reject haxelib register and haxelib submit when they are being executed from an old client (and give an error telling the user how to update haxelib).

We could do this by adding an extra parameter to the register and processSubmit remoting functions, which will be null if the client is old and have a proper value with new clients, however, this feels messy and isn't very future proof. Alternatively, we could update the api version again. @andyli is there a preferred way of doing this?

Aug 16 '22 22:08 tobil4sk

My idea right now:

Create a new password column in database, for storing hashes using whatever better algo, initially empty for existing users.
Release a new version of haxelib client that will send both the old md5 and the new hash, probably uses new remoting functions
The new remoting function will check the user's DB record. If the new password column is empty, fill it in (if the old md5 password hash matches).
Haxelib server will still accept submissions using the old md5 via the old remoting function, until the new client version is released for ~1 year.
Haxelib server will reject registrations using the old md5 remoting function immediately after the new haxelib client is released.

P.S. I've no experience in handling passwords in a system. Feel free to comment.

Aug 18 '22 01:08 andyli

I don't really see the point in having a transition period like that. There would be no extra security provided by the other hash during this period, as cracking the MD5 hash would also crack the new algorithm since the password was the same. Until the year is up and the md5 hashes are removed, there is no benefit, but the registrations would still be broken in return for no extra added security. I think it's better just to break compabitilty at the same time as rolling out the change.

So this is what I think would be good:

The current hashes are simply rehashed using the new algorithm (security is improved a bit immediately for all accounts, although this can only be temporary as it has some problems in itself)
Registrations and submissions are broken for clients connecting to version 3.0 of the api. These clients receive errors with instructions on how to update (haxelib update haxelib), however they can run all other commands as normal.
A new version of the api (4.0) is made (with the database still remaing at 3.0), which takes in the raw passwords instead of the md5 hashes
Whenever a user logs in, if the hash stored is still the rehashed md5, the password is hashed using both algorithms and if it matches, the hash is updated properly to remove the md5 step (we can keep track of this via a hashmethod field)
Perhaps we can reset passwords for accounts that don't log in after an amount of time, as rehashing an unsalted md5 brings its own security concerns and should not be a permanent solution.

I think it's fair to have these breaking changes, as it's easy to update haxelib and the current situation of hashing on the client is extremely insecure. Therefore, we should prevent library authors from using these old clients to register/submit when it can easily be avoided by updating their haxelib. Also, it will have no effect on users who aren't library authors.

I've started working on a binding for argon2 and some of these changes, so I can make a PR soon to give a better idea of how it might work.

Aug 18 '22 02:08 tobil4sk

There would be no extra security provided by the other hash during this period, as cracking the MD5 hash would also crack the new algorithm since the password was the same.

Once a user have their new password hash stored in the DB, the old md5 wouldn't be used anymore.

But I'm fine with disabling lib submission with the old hash anytime.

A new version of the api (4.0) is made (with the database still remaing at 3.0), which takes in the raw passwords instead of the md5 hashes

Letting the server able to read raw passwords is a bad idea. Passwords will be leak if the server provider/maintainer (e.g. Digital Ocean, or myself) is a bad actor. The server should only receive hashed passwords and the hashing is performed on user's machine.

Aug 18 '22 02:08 andyli

Once a user have their new password hash stored in the DB, the old md5 wouldn't be used anymore.

That's possible to do with the plan I sent above, if we accept submissions from old clients. However, there is an issue with this.

Let's suppose we want to implement this, so it's possible to use the md5 to login with old clients until the new hash is stored. We'll need a way to keep track of whether the hash has been updated (either with a hashmethod field, or your method of having two password fields, though I would argue the former is a cleaner, more future proof solution). If the hash has been updated fully, then we need to reject the client's request to log in, because there is no longer any way we can validate the md5 the old client sends. If the server stored hash hasn't been updated, we would need to now check the hash sent by the client, before knowing whether it matches.

Now, theoretically, there would be two ways an old clients submission request can fail: either the account no longer uses md5, so the request is outright rejected, or the account still uses it but the password is incorrect. The problem lies in how to handle these two errors.

Option 1 is that the error can explain fully why the login failed. However, this would allow an attacker to figure out if an account still uses md5 (and therefore if it is vulnerable) simply by sending a request with the md5 and seeing if it is rejected completely or if it says the password is wrong.
The other option is to not be specific about which situation occured, however, then, a genuine user won't know if their login failed bc they are using an outdated client or because they actually got their password wrong. Also, an attacker might still be able to tell which case it is based on the time taken to receive a response from the server.

We can avoid this problem by disabling submissions from old clients immediately, and that also gives us the freedom to implement a proper solution and not have to carry around a legacy field in the remoting api. So I do think doing that is the best solution.

Letting the server able to read raw passwords is a bad idea.

It doesn't really make a difference if the server receives a hashed password or the plain password. If the server is compromised in this way, even if the password is hashed on the client, then the bad actor could as well see the hash, which essentially acts as the password itself, and knowing the hash is enough to give them access to the account (using a custom client). Note that the password would still be hashed before being stored anywhere long term. Also, for the client to hash the password, there would have to be public access to any user's salt, which otherwise would be private apart from in the case of a database leak.

Client-side hashing is not a standard practice, and overall it leads to much worse security than server side hashing:

https://security.stackexchange.com/questions/8596/https-security-should-password-be-hashed-server-side-or-client-side
https://superuser.com/questions/1675013/for-websites-is-your-passwords-hash-computed-on-the-client-or-the-server-side

Aug 18 '22 12:08 tobil4sk

I think the conclusion is that we need hashing on both client side and server side. This is also suggested in the discussions in the stackexchange pages you referenced.

Aug 23 '22 04:08 andyli

I think the conclusion is that we need hashing on both client side and server side. This is also suggested in the discussions in the stackexchange pages you referenced.

Only if you use unique salt on both client and server (and a different one on each), which doesn't sound very practical in haxelib case since it would require one more back-and-forth between server and client for exchanging salt, as initial requests comes from the client.

Failing to do that would make hashing on client side less secure than sending plain password.

Aug 23 '22 05:08 kLabz

I think the conclusion is that we need hashing on both client side and server side.

I really don't see the benefit of hashing on both the client and server. If there is a bad actor who can modify the code running on the server to retrieve the password before it is hashed, then it doesn't matter if what the client sends is a hash or the raw password, either one will give access to the account. They can just modify the client to send the hash directly to log in. Also, if they are someone with that kind of power, they already have access to the server so they can simply just replace an author's libraries on the server without having to log into their account, and thus eventually get whatever code they want running on users' machines when they download the library.

Hashing only on the server side is common practice and there is really little to gain with client side hashing. However, there are a lot of drawbacks.

The first is complexity, which @kLabz alluded to. This will make haxelib much more complicated, especially if we wanted to implement it properly. Complexity leaves room for mistakes. Additionally, anyone who wants to implement their own client for haxelib has to also carry out this hashing as well. Also, we would need to add an extra dependency to the client.
The second is that it isn't future proof. The next time we change the hashing algorithm, we will again have to break logins for all old clients again. Any other clients interfacing with the server directly will also break. Whereas, if we just use server side hashing from now on, we can just break logins for old clients once right now (which we have to do anyway), and from now on we would never have this issue again. The hash algorithm could be updated by updating the server, and all previous clients would continue to work.

In my opinion, the best solution is just to follow the standard practice, which will lead to a simpler solution and avoid all the extra problems.

Aug 23 '22 11:08 tobil4sk

In #564, I implemented the gradual migration to argon2id as I described.

If we were to hash on the client as well (which I personally still oppose), we'll probably need to go for bcrypt instead of argon2. BCrypt has a Haxe implementation so it won't be much of a hassle if we ever decide to migrate the client to a target other than neko (or indeed if someone wants to make their own haxelib client using this repo as a library). However, the C implementation of argon2 is widely used so it is less likely to have security issues than any Haxe algorithm implementations. Having a C dependency is alright for the server but it is a hassle for the client, which is another reason why hashing on the client introduces complexity.

Aug 23 '22 19:08 tobil4sk

I also see no reason to do client-side hashing. Using one password per platform is what users should do anyway…

Aug 23 '22 21:08 Aurel300

Hashing on the client-side is not about protecting the haxelib account, but to protect the users account on other sites. (This is discussed in the stackexchange pages @tobil4sk commented eariler) Of course people shouldn't reuse password, but it's common.

The haxelib client has been doing client-side password hashing since the beginning. Let's not make a change that would improve security in one aspect but remove existing security measures.

Aug 24 '22 02:08 andyli

Hashing on the client-side is not about protecting the haxelib account, but to protect the users account on other sites. (This is discussed in the stackexchange pages @tobil4sk commented eariler) Of course people shouldn't reuse password, but it's common.

The haxelib client has been doing client-side password hashing since the beginning. Let's not make a change that would improve security in one aspect but remove existing security measures.

But that makes haxelib very vulnerable (md5 hashes dump from a hacked database anywhere can be directly used on it without even having to crack the passwords)

Aug 24 '22 09:08 kLabz

I'm not against replacing md5 with another algo. I just insist we need client-side hashing (with an additional server-side hashing too).

Aug 24 '22 09:08 andyli

I am also against client-side hashing, the server should be using TLS/SSL always, which as far as I know (not much) should make MITM attacks not be a concern.

Aug 26 '22 18:08 l0go

@andyli Can we look into this more? Delaying something important for security shouldn't be happening.

Sep 22 '22 17:09 l0go

@tobil4sk Thanks for your work, and sorry for blocking this long-awaited improvement. Let me reiterate one last time what my concern is and what I think would satisfy everyone here.

My main concern is that the lack of client-side hashing would give the haxelib server operator (myself, the current hosting provider, and whoever got access to the haxelib server) read access to the passwords. It means haxelib users who reuse their passwords may give out access to their other online accounts. I have yet seen anyone address this point since I mentioned it in this thread.

Here is my final proposal - keep the current client-side md5 hashing and add server-side argon2id (or any modern secure method) hashing. This at least wouldn't be a security downgrade regarding my main concern. We then got better hashing on the server-side, which is what everyone here agreed to be most important. This proposal also has the benefit of no change on the haxelib client, making it a server-only change. i.e. haxelib users no need to upgrade their haxelib clients for this enhancement.

Dec 23 '23 22:12 andyli

If the server operator is evil and sees the md5 hashes, they can still crack the password. Keeping the md5 hash on the client side has zero security benefit. It might even reduce security (for example there's some issues with mixing md5 and bcrypt, though I'm not sure if it also applies to Argon2id)

Client side hashing only makes sense if it uses a proper key derivation function. But this requires shipping the argon2 ndll with the client (or DIY-ing a Haxe implementation, which I would not recommend) and we will run into the same issue as today if we need to switch hash functions again.

As for password reuse: I would really hope that all programmers use a password manager and randomly generated passwords. In my opinion this issue is solved by educating users (and implementing functionality to allow users to change their password if necessary).

Also, an evil server operator can do things like replacing major haxelibs with malware so I think compromised passwords would be the least of our concerns in such a case.

TLDR: Keeping md5 in the client has at best no benefit and at worst is a security issue. Proper client side hashing is technically possible but (in my opinion) does not have enough benefits to justify the complexity involved in deploying it.

Dec 24 '23 09:12 Apprentice-Alchemist

Right, I know client-side md5 hashing is a weak protection against evil server operator, but it's still better than no protection. To be extra clear, my bottomline is "do not reduce existing security measures". Removing client-side hashing would reduce protection against evil server operator, and is also a breaking change that require a haxelib client upgrade from the user side.

Keeping md5 in the client has at best no benefit and at worst is a security issue.

I can't see how keeping md5 in the client (when strong server-side hashing is added) is worse.

Dec 24 '23 10:12 andyli

While I find this evil server operator argument somewhat contrived, I also don't see how the MD5 client-side hashing could be worse. I can see how there's a very strong argument that it has no benefit, but in that case we might as well keep it in order to have less friction.

IMO the decision should hinge entirely on that aspect: If it's harmless then let's keep it, if it has the potential to cause problems then let's not keep it.

Dec 24 '23 10:12 Simn

Having looked into it a little more, it looks like "password shucking" attacks would also work for argon2id+md5 (the usual example given is bcrypt+md5, but the attack doesn't depend on any specifics of bcrypt as far as I can see) So I would say keeping md5 in the mix does have the potential to cause problems.

Dec 24 '23 11:12 Apprentice-Alchemist

If I understand correctly, password shucking requires an attacker to target individual users and will only work on users who re-used a password across websites. I think most of you in this thread don't care much about people who re-use passwords.

Dec 24 '23 11:12 andyli

Here is my final proposal - keep the current client-side md5 hashing and add server-side argon2id

Keeping the (unsalted) Md5 algorithm around (even if combined with a modern hashing algorithm) is a security liability. Rehashing with a better algorithm won't mitigate the flaws with Md5, so the hash produced by argon2id(md5(password)) suffers the same problems as the Md5 hash (e.g. collisions, etc). This is considerably less secure than if Md5 is not involved at all, and it makes the hashes susceptible to hash shucking, as mentioned:

https://www.scottbrady91.com/authentication/beware-of-password-shucking

Dec 24 '23 12:12 tobil4sk

From the source you referenced (https://www.scottbrady91.com/authentication/beware-of-password-shucking),

Don’t forget that password shucking requires a user to re-use a password across websites (unfortunately, this is highly likely) and for an attacker to target individual users. As a herd immunity, I think rehashing alone will still help.

Dec 24 '23 12:12 andyli

I now read a bit about password shucking, if I understand correctly it's basically this situation expressed in Haxe code:

import haxe.crypto.BCrypt;

function main() {
	var leakedMd5Hashes = ["48f03653ae8b6d062bb8b7ae177cb391"];
	var leakedBCryptHashes = ["$2b$10$jziC4wqx5Zz6oIRXidy2ceQM8lhQP8jIhJvy/1hOV4xlEQPsNIeq."];
	var leakedSalt = "$2b$10$jziC4wqx5Zz6oIRXidy2ce";

	for (md5 in leakedMd5Hashes) {
		var bc = BCrypt.encode(md5, leakedSalt);
		for (bc2 in leakedBCryptHashes) {
			if (bc2 == bc) {
				trace("FOUND MATCH: " + md5);
				return;
			}
		}
	}
}

Is that accurate?

That quote from Scott Brady "As a herd immunity, I think rehashing alone will still help." is a bit confusing, but I think what he means is that md5 < bcrypt(md5) < bcrypt. So if the only alternative is MD5 hashing, then it's still better to run bcrypt over the MD5 hashes. But ultimately it's even better to not do that and use bcrypt only.

Dec 24 '23 12:12 Simn