MultiPar icon indicating copy to clipboard operation
MultiPar copied to clipboard

request: filter for small files

Open ReaperX opened this issue 2 years ago • 3 comments

When creating par sets for very large file collections with vastly different file sizes, I exclude all the small files. Efficiencies are always in the single digits when very small files are present, plus the probability that bit rot occurs on one of the small files is low.

It is currently inconvenient to get rid of small files. I select a folder, and get all files by default, and sort by size. Then I select all files up to a threshold, and press delete.

Could we get an easier way of accomplishing this? A checkbox and input field for a number X so that files below X bytes could be ignored or eliminated from the input would be great.

ReaperX avatar Jul 31 '22 20:07 ReaperX

The program will limit the number of files, image

you can use other software to move out the large files according to the structure and then check, such as FreeFileSync. image image

Of course it would be nice if MultiPar could do this in one program.

supply9243 avatar Aug 02 '22 00:08 supply9243

I'm sorry for slow reply. I cannot use my PC so long while hot summer. Currently I do easy task only.

Could we get an easier way of accomplishing this?

Though the filtering feature is interesting and maybe useful, it's complex. Also, the usage is varied by users. So, I feel that something other tool or customizable script may be good. When I have time, I will test Python script to implement such feature.

Yutaka-Sawada avatar Aug 02 '22 12:08 Yutaka-Sawada

I wrote a Python script to exclude small files. To test the script, you need to install Python.

Usage: Save the script on a text file like large_files.py. Drag & Drop a folder (or folders) on the script file.

In the script, it will search source files over than 1 MB. You may change the value. You may edit the script to add more filters. Such like, filename, extension, or attribute.

import sys
import os
import subprocess


# Set path of MultiPar
client_path = "C:\\test\\MultiPar\\MultiPar.exe"

# Set path of file-list
list_path = "C:\\test\\MultiPar\\save\\file-list.txt"


# Make file-list of source files in a folder
# Return number of found files
def make_list(path):
    f = open(list_path, 'w', encoding='utf-8')
    file_count = 0

    # Search inner files
    with os.scandir(path) as it:
        for entry in it:
            # Exclude sub-folder and short-cut
            if entry.is_file() and (not entry.name.endswith('lnk')):
                #print("file name=", entry.name)
                #print("file size=", entry.stat().st_size)

                # Check file size and ignore small files
                # Set the limit number (bytes) on below line.
                if entry.stat().st_size >= 1048576:
                    f.write(entry.name)
                    f.write('\n')
                    file_count += 1

    # Finish file-list
    f.close()
    return file_count


# Return sub-process's ExitCode
def command(cmd):
    ret = subprocess.run(cmd, shell=True)
    return ret.returncode


# Return total size of internal files
def get_dir_size(path='.'):
    total = 0
    with os.scandir(path) as it:
        for entry in it:
            if entry.is_file():
                total += entry.stat().st_size
            elif entry.is_dir():
                total += get_dir_size(entry.path)
    return total


# Read arguments of command-line
for idx, arg in enumerate(sys.argv[1:]):
    one_path = arg
    one_name = os.path.basename(one_path)

    # Check the folder exists
    if os.path.isdir(one_path):

        # Check empty folder
        if get_dir_size(one_path) > 0:
            print(one_name + " is folder.")

            # Path of creating PAR file
            par_path = one_path + "\\" + one_name + ".par2"

            # Check the PAR file exists already
            # You must check MultiPar Option: "Always use folder name for base filename".
            if os.path.exists(par_path):
                print(one_name + " includes PAR file already.")

            else:
                # Make file-list
                file_count = make_list(one_path)
                #print("file_count=", file_count)
                if file_count > 0:

                    # Set command-line
                    # Cover path by " for possible space
                    # Specify source file by file-list
                    # The file-list will be deleted by MultiPar automatically.
                    cmd = "\"" + client_path + "\" /create /base \"" + one_path + "\" /list \"" + list_path + "\""

                    # Process the command
                    print("Creating PAR files.")
                    error_level = command(cmd)

                    # Check error
                    # Exit loop, when error occur.
                    if error_level > 0:
                        print("Error = ", error_level)
                        break

                else:
                    print(one_name + " doesn't contain source files.")

        else:
            print(one_name + " is empty folder.")

    else:
        print(one_name + " isn't folder.")

# If you don't confirm result, comment out below line.
input('Press [Enter] key to continue . . .')

Yutaka-Sawada avatar Aug 10 '22 01:08 Yutaka-Sawada