john icon indicating copy to clipboard operation
john copied to clipboard

Fix requirements.txt

Open RecRanger opened this issue 1 year ago • 11 comments

Fixes #5474

RecRanger avatar May 11 '24 20:05 RecRanger

I'm not into Python. I'd appreciate to hear from someone who is, @exploide maybe? Thank you!

solardiz avatar May 13 '24 16:05 solardiz

Ok, I see multiple issues here.

  • protobuff with two f's is indeed incorrect. The correct package is named protobuf and the one with two f's doesn't even exist. However it could be added afterwards, which would result in a typosquatting attack if it contains malicious content. So this is directly a security vulnerability.
  • Pinning the version down to an older version forever is probably not what we want. We should aim to make the script compatible with the latest release of protobuf.
  • A top-level requirements.txt in the john repo is probably not appropriate. This particular dependency only affects a single script. And since the scripts are more or less stand-alone and independent, it could as well be the case that another script just requires a different version of protobuf (hypothetical example).
  • There is a bunch of other dependencies needed from within other scripts. And these are all not mentioned in this version of requirements.txt. It is very specific for the script that requires protobuf.

So I see two solutions: Fix the incompatibility in the script to make it work with latest protobuf. Or if this requires too much effort for the moment, include the version constraint in the pip install instruction suggested by the import exception handling instead. A top-level requirements.txt does not feel right for a project like john. But I'm not the one to decide.

exploide avatar May 13 '24 17:05 exploide

Good catch, I was moving too fast and made that typo. Fixed just now.

I would strongly suggest adding this requirements.txt file with whatever versions are required to make it work right now. Then, you can upgrade dependencies in the future if that's a desire.

I also just added all the other dependencies to the requirements.txt file which are mentioned via a pip install printed message. I didn't put versions on them, but they should all have pinned versions when tested.

RecRanger avatar May 13 '24 21:05 RecRanger

Thank you for the review @exploide! I'm not sure what is best to do here. @RecRanger argues for this approach further in #5474.

@RecRanger Thank you for the fix and additions. When fixing an error in a commit within a pending PR, we normally amend the commit and force-push - not add another commit. We also prefix commit titles with subsystem name whenever relevant (would be "requirements.txt: " starting with the second commit here). In this case, though, I suggest merging all 3 commits into one. We can do that at PR merge time by using GitHub's "squash and merge" feature, although this will make GitHub appear as the committer in git history.

solardiz avatar May 13 '24 21:05 solardiz

@RecRanger Where does the dependency on pycrypto come from? Is it indirect? Do we need to list indirect dependencies? (This may be fine. I am just asking.)

solardiz avatar May 14 '24 12:05 solardiz

I personally always use Squash-and-Merge when working in semi-anon/informal teams (e.g., open source projects) to make reverting changes easier. Will do better next time though!

I got the library list with a grep 'pip install' search. The pycrypto library came from:

  • telegram2john.py
  • DPAPImk2john.py

Indirect dependencies should not be listed. Libraries are assumed to have valid pyproject.toml/setuptools configurations which indicate which dependencies to install.

RecRanger avatar May 14 '24 20:05 RecRanger

From #5474

I don't see anything wrong with continuing to use the older version of Protobuf (which is tested, validated, etc.). Upgrading the dependency seems like a pointless endeavor for no gain.

In principle, I would always recommend to make a software compatible with the latest version of a dependency. For example, it might happen that a vulnerability becomes known for the older version or that there is an unpleasant bug. Furthermore, if someone decides to package this for a Linux distribution, they will fail because they cannot ship an older version of a Python package than the latest one already packaged for the distribution.

The error message explicitly says that the current Python code uses features that are deprecated in protobuf>3.20. We'd likely have to regenerate the Python protobuf files, iirc (which is a huge pain).

But in this case, I agree. I don't have any experience with protobuf file generation and if it is a huge pain, then we should not enforce it for the moment. (But welcome it when someone likes to do the upgrade.)

I also just added all the other dependencies to the requirements.txt file which are mentioned via a pip install printed message. I didn't put versions on them, but they should all have pinned versions when tested.

There are different points of view on this topic. Personally, I don't like pinning version numbers as long as there is no process of keeping them up to date. Otherwise you will soon have a requirements.txt which just contains a bunch of horribly outdated package versions. Which I don't like for reasons as explained above. I would only introduce version constraints where they are technically necessary. However, I know that there are different approaches which emphasize the desire of having a tested set of working dependency versions.

I already tried to explain that a combined global requirements.txt doesn't feel intuitive for me, because this is not a Python project at all. john is a C software and then there are just a bunch of standalone Python scripts as auxiliary tools. And not only Python, there is also Perl and maybe some other stuff. I don't see a big advantage of a requirements.txt here. But I believe you see an advantage. So it is fine for me to include a requirements.txt but again, I'm not the one to decide.

Since the file is only useful for the scripts in run/ I think requirements.txt should also be moved to run/. As I mentioned, john itself is not a Python project. But this is not really important.

Where does the dependency on pycrypto come from?

Indeed, pycrypto is a dependency. However, it is insecure and deprecated and should no longer be used. There is pycryptodome as a drop-in replacement. Is has the same API. So please list this as a requirement instead. I will conduct a separate PR soon, which also adapts the error message within the scripts.

Sorry for this lengthy answer. Just wanted to make my points clear. Whether to include a requirements.txt or not is up to @solardiz

exploide avatar May 15 '24 12:05 exploide

Thank you guys for the discussion. I had actually thought of us adding a requirements.txt and related documentation before, when I saw that the btcrecover has that, although sure enough they're primarily a Python project and we're not. Incidentally, IIRC our two pre-generated protobuf files came from btcrecover, so maybe we should look there (at all forks of it) for any updates.

Since the file is only useful for the scripts in run/ I think requirements.txt should also be moved to run/. As I mentioned, john itself is not a Python project. But this is not really important.

This may actually be a good idea. It could possibly stay available in a distro package that includes our scripts, but does not automatically get the Python modules pre-installed as the distro package dependencies. What do you think, @RecRanger?

solardiz avatar May 15 '24 16:05 solardiz

Projects should work out of the box. This PR is how you make it work out-of-the-box.

Ideas for AFTER this bare-minimum PR is merged

  1. If you want opt-in requirements for features, then setup a pyproject.toml.
  2. Requirements can be pinned with version ranges.
  3. Bots can be used to bump requirements, given adequate test cases.

There is no perfect solution for balancing security, functionality, and developer experience. This is the most basic solution. Let's upgrade incrementally from a bare-minimum.

RecRanger avatar May 15 '24 23:05 RecRanger

I've just merged @exploide's #5478 (obviously triggered by your work here @RecRanger, thanks!) so now we need to revise the changes here accordingly.

solardiz avatar May 16 '24 13:05 solardiz

Revised the "pycrypto" to "pycryptodome" dependency. Also squashed the commits.

RecRanger avatar May 17 '24 23:05 RecRanger

Thanks, but now the commit message records change history within this PR, which is temporary and irrelevant for our git history. So if it's just one commit, it should say just "Add requirements.txt and update .gitignore for Python" (or this could be 2 commits).

solardiz avatar May 18 '24 12:05 solardiz

Soooo like, is someone gonna merge this PR? As much as I'd love critique on my git technique and gitignore setup, I'm actually just here because the 100 contributors that came before me didn't setup the gitignore for 1 out of 2 languages used in this repo.

RecRanger avatar May 18 '24 21:05 RecRanger

Soooo like, is someone gonna merge this PR?

OK, I did a squash-and-merge on this to set a commit message that matches the final changes. Thank you!

solardiz avatar May 19 '24 12:05 solardiz