asv icon indicating copy to clipboard operation
asv copied to clipboard

file names should not use externally generated strings

Open kevingerman opened this issue 4 years ago • 1 comments

Currently result files are named using the hash that was benchmarked. For repos which use an actual hash as an identifier this has been working, however, if the repo impl includes path characters in the commit hash publish will fail to find the results.

While incredibly unlikely in this use case, its always good security practice to ensure that external strings are not passed to the shell or OS in which a program is running.

All of the information about the commit is stored in the json data. I was not able to find anywhere where the commit hash was pulled from the filename in processing results.

kevingerman avatar Mar 19 '20 14:03 kevingerman

The implementation for this feature should be pretty simple. If we change the the impl of results.py#get_filename( ...) to look something like

 def get_filename(machine, commit_hash, env_name):
    """
    Get the result filename.  Hash of commit_hash and
    environment name in a folder named for the machine.
     """

    fname=hashlib.md5( "{}/{}".format(commit_hash,env_name).encode( 'UTF-8')).hexdigest()
    return os.path.join( machine, "{}.json".format(fname) )

it simplifies the method and has the side effect of grouping results on the FS by machine name without impacting behavior or tests.

kevingerman avatar Jun 09 '20 19:06 kevingerman