MultiPar icon indicating copy to clipboard operation
MultiPar copied to clipboard

Python bindings or other output

Open Safihre opened this issue 3 years ago • 7 comments

We use Multipar verification+repair in our application extensively and love the performance! The reason we also donated a couple of times already and will keep donating when we can!

Some background information of our problem: To display the progress of verification and repair, we parse the output of Multipar line by line. This way we can show the verification progress to the user as Verifying file X/Total and we can show the repair progress during repair. Additionally we try to keep track of any renames that Multipar performs, so we can keep our own bookkeeping of the files, matching the reality on disk. For example if a user would Retry to download a whole job but Multipar renamed a file, we wouldn't have to download the original file again. Lastly, we parse the result message to see what action to take (try to get more par2 files or give up).

So I was wondering if there is some easier way we can get this information. For example python bindings that can call the Multipar functions and get some progress information. However, I understand this is difficult.

We would already really be helped if there was some structured output file (json?) that contains that status information:

  • Number of files scanned
  • Verification result
  • Renames performed (including old and new name)
  • Joins performed (.001, etc files)
  • Repair result
  • Anything else of interest

Safihre avatar May 13 '21 15:05 Safihre

I designed current par2j.exe's output to be simple for watching by human eyes. Though the GUI front end (MultiPar.exe) parses the output to check progress and result, this system may be complex and difficult for other applications. So, it's possible to write processing state on a temporary file (JSON format) as same as printing on console window. But, I don't know that your application can read a memory-mapped file in shared mode. If it writes data on a normal file, it may become slow. (Or disk cache may decrease file IO problem.) Normally a JSON file parser may not think another application will append more data contiuously. I'm not sure that it will work or not.

If you want to try, I can add an option (such like, verbose output on a file) and write on it. I need exact details of what data you want to read from the file. Or you just modify par2j.exe's source code and compile by yourself. Because I changed my using character encoding to UTF-8 and Visual Studio 2019 is available in free, it's easy to modify now.

Yutaka-Sawada avatar May 14 '21 05:05 Yutaka-Sawada

Though the GUI front end (MultiPar.exe) parses the output to check progress and result, this system may be complex and difficult for other applications.

Ah, so the GUI also just parses the command-line-style output of par2j.exe?

It seems Python does support memory-mapped files, as long as we get the file descriptor: https://docs.python.org/3/library/mmap.html#mmap.mmap

Or you just modify par2j.exe's source code and compile by yourself.

I am very bad at C++, so wouldn't know how to modify the code myself.

As a test, we could start with one of 2 things:

  1. Get the current status while it is running verification/repair: verifying X/X, scanning X/X, repairing X%.
  2. Get status after the program finishes: get the renames performed

Safihre avatar May 14 '21 07:05 Safihre

so the GUI also just parses the command-line-style output of par2j.exe?

Yes, I do. It reads output of par2j.exe and parses each line.

I found a good option FILE_ATTRIBUTE_TEMPORARY in Win32API CreateFile. Using this may be simple for normal file access. I made sample functions to print state on a file. It adds lines like .INI format. JSON format seems to be bad for appending lines. The file format is easy to change later. At this time, it writes progress percent, number of found blocks, and verifying filenames.

I put the sample (par2j_sample_2021-05-15.zip) in "MultiPar_sample" folder on OneDrive.

Yutaka-Sawada avatar May 15 '21 00:05 Yutaka-Sawada

We would already really be helped if there was some structured output file (json?) that contains that status information:

I learned some about Python script. I put some sample Python script files in tool folder of recent versions. It's possible to test the request on my PC. Are you still interested in the output (.JSON format) for Python ? While I know a little about Python, I will be able to test. It may become an item in "Batch Processing" options on MultiPar.

Yutaka-Sawada avatar Oct 11 '22 12:10 Yutaka-Sawada

I would still be interested in getting the status during and after the program finishes. Getting the status while the program is running might be too complicated, and would still result in me parsing those INI files. Mostly getting the information about renames and joins after it finishes would be great, currently we have to parse the command-line output for that.

Safihre avatar Oct 11 '22 13:10 Safihre

I implemented JSON writer. I tested Python to read JSON files. Though par2j needs to convert directory mark from Windows \ to UNIX /, it seems to works. It's useful to support array. My sample JSON file is like below;

{
"SelectedFile":"Full path of selected recovery file",
"BaseDirectory":"Full path of base directory of source files",
"RecoveryFile":[
"Name of recovery file1",
"Name of recovery file2",
"Name of recovery file3"
],
"SourceFile":[
"Name of source file1",
"Name of source file2",
"Name of source file3"
],
"FoundFile":[
"Name of found file1",
"Name of found file2"
],
"ExternalFile":[
"Name of external file1",
"Name of external file2"
],
"DamagedFile":[
"Name of damaged file1",
"Name of damaged file2"
],
"AppendedFile":[
"Name of appended file1",
"Name of appended file2"
],
"MissingFile":[
"Name of missing file1",
"Name of missing file2"
],
"MisnamedFile":{
"Correct name of misnamed file1":"Wrong name",
"Correct name of misnamed file2":"Wrong name"
}
}

Renames performed (including old and new name)

Because the JSON output is verification result, it includes status of misnamed files. The misnamed file has a wrong name at verification. If you could repair, the item becomes old state of renamed files. I uses Dict format of Python. Key is original name of source file, and Value is current wrong name. After repair (rename), the Value becomes old name.

Joins performed (.001, etc files)

Because the JSON output is verification result, it includes status of found files. The found file may be one of splited source file at verification. If you could repair, the item becomes a piece of joined files. To be simple form, it doesn't say the origin of pieces.

I put the sample (json_sample_2022-10-14.zip) in "MultiPar_sample" folder on OneDrive. I includes a sample Python script, too. Please test them. If you want more items, post here again.

Yutaka-Sawada avatar Oct 14 '22 08:10 Yutaka-Sawada

Looks great! Have to test a bit more what happens with repaired joinable files indeed. I don't think it's necessary (at least not for me) to change \ to /, our Python code can handle that just fine.

Safihre avatar Oct 17 '22 06:10 Safihre

I don't think it's necessary (at least not for me) to change \ to /, our Python code can handle that just fine.

That was my miss-understanding. It's not matter of Python, but restriction of JSON format. I found that JSON format requires escaping \ (directory separetor of Windows OS). Because I'm lazy to escape them, I replace \ to / in the JSON file.

In the resulting JSON file, I removed the last / from "BaseDirectory" item. The new style would be good for Python functions like os.path.basename() or os.path.dirname(). If there is no problem, I will release next version after I write help documents in English.

I put the sample (json_sample_2023-02-06.zip) in "MultiPar_sample" folder on OneDrive.

Yutaka-Sawada avatar Feb 06 '23 14:02 Yutaka-Sawada