Add support for absolute file path in cachefile
Versions of BestSource besides R2-RC1 seemingly do not support specifying the exact cache file location. The documentation (snippet below) suggests that the value supplied to cachepath is made relative to the input/source file location. Although this may be perfectly adequate for most VapourSynth users who invoke bs.VideoSource(...) manually, programs and scripts invoking the function programmatically are vulnerable to errors as they cannot specify an absolute file path. One such program which has unexpected behavior is Av1an, at least when using Windows.
The issue with programs such as Av1an is that the same VapourSynth script is executed multiple times with the expectation that the initial cache file is preserved between executions. However, due to the unexpected behavior of how the cache file location is constructed, the cache file fails to write to disk and results in very poor performance as BestSource is regenerating the cache file for each subsequent process.
cachepath behavior according to the readme.md:
cachepath: The path where cache files are written. Note that the actual index files are written into subdirectories using based on the source location. Defaults to %LOCALAPPDATA% on Windows and ~/bsindex elsewhere.
Example 1 - Simple
We have an input video file located here: C:/Path/To/Video.mkv and are specifying an absolute cachepath value of C:/Path/To/Cache. Presumably BestSource should be aware that the cachepath provided is an absolute path and not a relative path to be merged with video file location.
Consider the following example script:
from vapoursynth import core
video = core.bs.VideoSource(source='C:/Path/To/Video.mkv', cachepath='C:/Path/To/Cache')
video.set_output()
Executing this script with version R2-RC1 and the latest R6 version we see the following files along with some annotations:
C:\> tree /f .\Path\
Folder PATH listing
...
C:\PATH
└───To
│ Cache.0.bsindex // Generated when using R2-RC1
│ Video.mkv // Input/Source
│
└───Cache
└───Path
└───To
Video.mkv.0.bsindex // Generated when using R6
R2-RC1 generates a cache file precisely where it is commonly expected while the latest version, R6, uses the end portion of the provided path as the first folder name and the rest of the provided path as a relative path to the input video file location resulting in this path: C:\Path\To\Cache\Path\To\Video.mkv.0.bsindex. Although the documentation suggests that the behavior of R2-RC1 is incorrect, it is much preferred over the behavior of other versions.
Example 2A - Av1an (more vulnerable to errors)
This example seems to be vulnerable to potentially providing cachepath an invalid path and thus results in BestSource failing to write the cache to the disk. As Av1an is an application that relies on repeatedly accessing the same cache file, the absence of the cache file results in BestSource regenerating the cache. This is an expensive process that can turn a 20 minute operation into a 2 hour operation.
Addendum 1: This example was intended to introduce both where the issue was discovered and how to replicate it exactly. However, it lacks specific details regarding the generated VapourSynth script. This is addressed in Addendum 2, Example 2B below.
Consider the following Av1an command:
av1an -i {INPUT} -o {OUTPUT} -y --temp {TEMPORARY} --keep --resume -w 1 --chunk-method bestsource
Monitor: ~/{TEMPORARY}/split/cache.json/C - it will be empty as it seems that the cache isn't being created/written and every time a chunk starts encoding the entire process to index the input video is unexpectedly repeated.
Example 2B (Addendum 2)
This is a specific example that can replicate the issue without the need for any other software besides VapourSynth. VapourSynth scripts generated programmatically by Av1an currently appear very similar to this example. As a reminder, BestSource has no issues with parsing and manipulating the absolute path of the input video file (source), which is formatted the same as the absolute path of the cache file (cachepath).
Consider the following VapourSynth script located here: C:\Path\To\loadscript.vpy. We expect the cache file to be located here: C:\Path\To\[SomeTag] $SomeName With-Dashes_And+Underscores\temporary\split\cache.json. Putting aside that the file name changes to include what is presumably the index of the video and the BestSource index extension, the cache file should at the very least be located directly under the ~/split/ folder.
from vapoursynth import core
core.max_cache_size=1024
core.bs.VideoSource("\\\\?\\C:\\Path\\To\\[SomeTag] $SomeName With-Dashes_And+Underscores\\[Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv", cachepath="\\\\?\\C:\\Path\\To\\[SomeTag] $SomeName With-Dashes_And+Underscores\\temporary\\split\\cache.json").set_output()
Executing this script with version R2-RC1 and the latest R6 version we see the following files along with some annotations:
PS C:\> tree /f .\Path\
Folder PATH listing
...
C:\PATH
└───To
│ loadscript.vpy // This script
│
└───[SomeTag] $SomeName With-Dashes_And+Underscores
│ [Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv // Source/Input video file
│
└───temporary
└───split
│ cache.json.0.bsindex // Correctly generated and successfully written cache FILE to disk by R2-RC1
│
└───cache.json // Incorrectly generated FOLDER named "cache.json" by versions besides R2-RC1
└───C // BestSource versions besides R2-RC1 fail to write the cache to disk
R2-RC1 generates and successfully writes the cache file to the disk in the expected folder, albeit with a modified name and extension. R6 attempts to write the cache to the disk but instead produces a folder named after the cache file, with a subfolder presumably named after the disk label. Ultimately, this issue leads to subsequent executions of the same script running far longer than is necessary.
Example 3 - Benchmarking with VSPipe
Addendum 3: This example is intended to provide some insight into how Av1an is invoking BestSource via VapourSynth. Additionally, it provides some insight into how this issue was measured when using a testFile - typically produced by Av1an - against BestSource R2-RC1 and other versions. It is NOT required to be executed to understand or act upon this issue. It is an additional reference as to how this issue can be replicated in a programmatic context.
Below is merely a crude example that simplifies the core operation of Av1an into a TypeScript (JavaScript) snippet for benchmarking VapourSynth scripts. Given a set of frame ranges to output, we measure the amount of time it takes to complete the operation. With R2-RC1, the first range that is processed results in a longer time due to indexing and subsequent processes complete at a much faster rate as the cache has been successfully written to the disk. However with other versions, this is no longer the case as every process requires indexing the cache which fails to write to the disk.
const ranges: [start: number, end: number][] = [
[0, 300],
[1900, 2200],
[29635, 29690],
[11000, 11400],
[4500, 4900],
[26500, 28000],
[2300, 6000],
[23000, 24000],
[500, 550],
[750, 775],
];
async function encode(start: number, end: number) {
return new Promise((resolve, reject) => {
const startTime = new Date();
console.log(`[${startTime.toISOString()}]: Encoding ${start} - ${end}`);
const vspipe = spawn('vspipe', [testFile, '-p', '-o', '0', '-c', 'y4m', '-', '-s', `${start}`, '-e', `${end}`,], { stdio: ['pipe', 'pipe', 'pipe'] });
const svt = spawn('SvtAv1EncApp', ['-i', 'stdin', '--progress', '2', '--keyint', '0', '--lp', '2', '--crf', '20', '--preset', '10', '-b', `${outputFolder}/${start}-${end}.ivf`], { stdio: ['pipe', 'pipe', 'pipe'] });
// pipe vspipe output to svt
vspipe.stdout.pipe(svt.stdin);
vspipe.on('close', (code) => {
const endTime = new Date();
if (code === 0) {
const seconds = (endTime.getTime() - startTime.getTime()) / 1000;
const fps = (end - start) / seconds;
console.log(`[${endTime.toISOString()}]: Done ${start} - ${end} in ${endTime.getTime() - startTime.getTime()}ms - ${fps.toPrecision(3)} fps`);
// return resolve(fps);
} else {
// Add new status with the state 'error'
const error = new Error(`Vspipe exited with code ${code}`);
reject(error);
}
});
svt.on('close', (code) => {
const endTime = new Date();
if (code === 0) {
const seconds = (endTime.getTime() - startTime.getTime()) / 1000;
const fps = (end - start) / seconds;
console.log(`[${endTime.toISOString()}]: Encoded ${start} - ${end} in ${endTime.getTime() - startTime.getTime()}ms - ${fps.toPrecision(3)} fps`);
return resolve(fps);
} else {
// Add new status with the state 'error'
const error = new Error(`SvtAv1EncApp exited with code ${code}`);
reject(error);
}
});
vspipe.on('error', (error) => {
const errorTime = new Date();
console.log(`[${errorTime.toISOString()}]: Error ${start} - ${end} in ${errorTime.getTime() - startTime.getTime()}ms`);
reject(error);
});
svt.on('error', (error) => {
const errorTime = new Date();
console.log(`[${errorTime.toISOString()}]: Error ${start} - ${end} in ${errorTime.getTime() - startTime.getTime()}ms`);
reject(error);
});
});
}
const startTotal = new Date();
console.log(`[${startTotal.toISOString()}]: Start`);
for (const [start, end] of ranges) {
await encode(start, end);
}
const endTotal = new Date();
const totalSeconds = (endTotal.getTime() - startTotal.getTime()) / 1000;
const totalFramesCompleted = ranges.reduce((a, b) => a + b[1] - b[0], 0);
console.log(`[${endTotal.toISOString()}]: Done in ${endTotal.getTime() - startTotal.getTime()}ms - ${(totalFramesCompleted / totalSeconds).toPrecision(3)} fps`);
Other References
It appears that shssoichiro, an active Av1an contributor, also came across similar unexpected behavior when supplying a value to cachepath as described in their issue.
While they were able to produce a workaround for this issue, the problem still persists for novice/casual users of Av1an and similar programs where the most common usage of them results in this unexpected behavior.
I believe it would be best if BestSource were to recognize absolute file paths and ensure that the cache is being written to the disk as expected by users and programs. Understandably, there may be concerns where the user provides an invalid absolute path. I suggest that in cases such as this, it may be considered an error and halt the program or simply fallback to the default behavior as documented in the readme.md.
Apologies for the lengthy issue - let me know if more details are required.
Hello, I have updated the original issue with 3 addendums along with a couple minor fixes and highlights. Please feel free to comment should you require any clarifications or simply wish to discuss how best to approach this.
Thanks, - Boats
Hello again. I'd like to thank @myrsloik, @arch1t3cht, and others for their near-immediate action towards addressing the issue at hand. As some discussions regarding the issue have taken place in private servers, I will be copying those questions and answers here to keep other interested parties informed and invite them to participate. Additionally, I have been debugging the program and experimenting with solutions since providing clarifications. I have discovered an additional related issue in doing so and will expand upon that below and in some addendums.
Clarifications
Full conversation here with EST times:
Query:
@myrsloik: How do you get those weird paths?
@BoatsMcGee: The paths are weird for human input but not for programmatic input. The file paths are valid and contain escape sequences to ensure characters such as ":" are preserved.
@myrsloik: Are those paths only produced in Av1an?
@BoatsMcGee: The paths exemplified are indeed produced by Av1an. They are still valid paths. The source input path has caused no trouble for BestSource before, thus we can assume BestSource is capable of reading/using paths of this format. (Addendum: This is no longer entirely true, see the related issue below)
@myrsloik: Can I reproduce it in any other way?
@BoatsMcGee: Av1an is not necessary to replicate this issue. Using it is perhaps the quickest way to observe it but there are other methods to replicate it. One of which is to execute vspipe.exe -p -c y4m .\script.vpy .\result.y4m with the following script:
script.vpy:
from vapoursynth import core
core.max_cache_size=1024
core.bs.VideoSource("\\\\?\\C:\\Path\\To\\[SomeTag] $SomeName With-Dashes_And+Underscores\\[Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv", cachepath="\\\\?\\C:\\Path\\To\\[SomeTag] $SomeName With-Dashes_And+Underscores\\temporary\\split\\cache.json").set_output()
(Addendum: the initial issue is related to caching, so it is assumed that the issue can be observed either by checking the cache file and/or executing the script again and observing that the cache was not found and that it requires another expensive indexing operation, hence the performance issues in some use cases)
Debugging and Expanding Related Issues
Building and Testing
The DLL is being produced in the exact same way as the Windows GitHub workflow YAML defined here: ./.github/workflows/windows.yml. It is built using the latest commit, 55fb7fd, in the master branch.
Without any modifications we can attempt to replicate the issue using the same instructions mentioned in the "Clarifications" section above. Strangely, this build appears to have a different result from those in Releases. Please observe the following new issue in the next section.
A New Related Issue
Executing vspipe with the script.vpy input with BestSource unmodified:
C:\Path\To> vspipe.exe -p -c y4m .\script.vpy -s 0 -e 1 .\result.y4m
Script evaluation failed:
Python exception: VideoSource: Couldn't open 'C:\?\C:\Path\To\[SomeTag] $SomeName With-Dashes_And+Underscores\[Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv'
Traceback (most recent call last):
File "src\\cython\\vapoursynth.pyx", line 3233, in vapoursynth._vpy_evaluate
File "src\\cython\\vapoursynth.pyx", line 3234, in vapoursynth._vpy_evaluate
File ".\script.vpy", line 3, in <module>
core.bs.VideoSource("\\\\?\\C:\\Path\\To\\[SomeTag] $SomeName With-Dashes_And+Underscores\\[Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv", cachepath="not_real\\..\\temporary\\split\\cache.json").set_output()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src\\cython\\vapoursynth.pyx", line 2969, in vapoursynth.Function.__call__
vapoursynth.Error: VideoSource: Couldn't open 'C:\?\C:\Path\To\[SomeTag] $SomeName With-Dashes_And+Underscores\[Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv'
It appears the source input file path is being parsed improperly, resulting in an invalid file path. Were it parsed properly, the path should be C:\Path\To\[SomeTag] $SomeName With-Dashes_And+Underscores\[Some.Tag] Some_Name - S01E01 [BD 1080p SomeTag][Some-Tag].mkv.
Debugging
A quick note: After comparing some versions I noticed R2-RC1 appears to be significantly different in relevant code paths; this explains the different behavior from other versions.
Regarding the latest version, I determined that the MangleCachePath(...) function found here is not perfectly adequate in sanitizing valid paths such as the one this issue often references. Additionally, The CachePath is assumed to be a relative path without verifying and is only used way. Lastly, the final returned path is resolved from adding the Tmp (sanitized Source) file location to the end of the CachePath. This appears to be a mistake considering the documentation, typical use, and common practices.
This issue was initially a request to allow supplying an absolute path and ensuring it is written successfully, something a previous version seems to offer. Now, I expand this issue to resolving the aforementioned issues both related to the cache file as well as the input/source file path.
I will very shortly be submitting a Draft Pull Request to begin addressing most of these issues and suggest related improvements.
Thank you for your time. I hope we can resolve all issues and perhaps make more improvements. - Boats M.
Invalid API usage. You're supposed to pass a valid filesystem::path (a normal path) and not some prefixed gunk WIN32 API allows in certain contexts. Prefixing things with \\?\ does indeed produce junk.
I'm afraid there's a misunderstanding here. These are valid file paths that are not specific to any particular programming language or situation (besides specifying extended paths). You could perhaps argue that the C++ standard library should handle these paths correctly but they unfortunately do not, hence the changes introduced in PR #60.
R2_RC1 from your URL was compiled with MSVC. Microsoft's implementation of std::filesystem::path supports DOS device paths.
The latest release from the releases page (R6) also uses MSVC. The issue is not present there.
The binaries from GitHub Actions are built with MINGW. The STL there does not support them.
TL:DR:
- Get R6: https://github.com/vapoursynth/bestsource/releases/latest
- The binaries from GitHub Actions should be built with MSVC instead of MINGW.