Scoop
Scoop copied to clipboard
[Feature] Make `scoop info` accept pipeline input as object-stream in the powershell spirit
Bug Report
I filed this as a bug instead of as a feature, as the inability to do so is a bit glaring. The general powershell philosophy (in contrast to that of unix-shell) is that the output should be a object stream.
Current Behavior
No way to get an scoop info * object stream.
So i tried to work around it using scoop search whose output is a mix of text-stream, perhaps a mix of console-host text-stream and debug/error text-stream
Consider how long the following script takes : 33 minutes on my i7-haswell laptop, with scoop on HDD.
PS C:\WINDOWS\system32> scoop search | Select-String -Pattern "^ " | ConvertFrom-String | %{ scoop info $_.P2.ToString() } | %{ -join($_.Name, " : " , $_.Description ) } > C:\tmpq\Downloads\scoop_search_op.txt
'extras' bucket:
'java' bucket:
'main' bucket:
PS C:\WINDOWS\system32>
The $_.P2.ToString() was required to avoid wrong-type-expected-errors, as app-names with length of single-char would have type SystemValue.Char instead of String.
The output-file is that I was interested is for your perusal scoop_search_op.txt
My motivation to write the above, was to introduce myself to many apps/commands that I had not heard of. There are so many apps/commands in scoop. (which is good!). When I do a scoop update and see app-manifests list of an updated scoop-bucket of apps other than what I have installed, I am interested in knowing what those apps are, just to learn about and know if some app may be useful to me. I could do scoop info by manually typing for each such curiosity. I wonder if there is some easy way to capture scoop update and do a scoop info on them. Unsure if you'd see that as a sufficiently valuable feature-add.
The above script takes a long time to finish. The trouble is: that powershell has to do a second command invocation scoop info, which could have been avoided if there was a way to get an object-stream output.
scoop info does not take * as an argument, i.e. there is no scoop info "*"
If scoop search/scoop info * output a object stream, and if the objects had suitable default to-string() printing, then the output-objects can more quickly and easily parsed and complex queries can be performed.
This could in theory apply to any scoop sub-command that has informative output.
Expected Behavior
An object stream is more rapidly processable by powershell. Objects can be manipulated by methods and properties.
Additional context/output
PS C:\WINDOWS\system32> scoop info "*"
Compare-Version : Cannot process argument transformation on parameter 'ReferenceVersion'. Cannot convert value to type
System.String.
At C:\vol\scoop_01\SCOOP\apps\scoop\current\lib\core.ps1:353 char:68
+ ... utdated = ((Compare-Version -ReferenceVersion $status.version -Differ ...
+ ~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Compare-Version], ParameterBindingArgumentTransformationException
+ FullyQualifiedErrorId : ParameterArgumentTransformationError,Compare-Version
WARN error: The given path's format is not supported.
Could not find manifest for 'MixedRealityRuntime'.
An example scoop info output
PS C:\WINDOWS\system32> scoop info mc
Name : mc
Description : Native GNU Midnight Commander for Win32
Version : 4.8.27
Bucket : extras
Website : https://midnight-commander.org
License : GPL-3.0-or-later
Updated at : 3/7/2022 1:56:35 AM
Updated by : github-actions[bot]
Binaries : mc.exe
Shortcuts : mc
Possible Solution
perhaps, an implementation of scoop info * would internally invoke scoop search and output an object stream of scoop info in a manner similar to the script suggested above. This would be more performant.
System details
Windows version: [e.g. 7, 8, 10] 10
OS architecture: [e.g. 32bit, 64bit] 64bit
PowerShell version: [output of "$($PSVersionTable.PSVersion)"]
PS C:\WINDOWS\system32> $($PSVersionTable.PSVersion)
Major Minor Build Revision
----- ----- ----- --------
5 1 19041 1645
Additional software: [(optional) e.g. ConEmu, Git]
Scoop Configuration
NA
//# Your configuration here
The output of something like scoop info * will be an immensely large PSCustomObject array, containing like 2000-3000 items, each similar to the above example. What is the possible use case for this?
The use case would be whenever user is interested in apps and queries/explores out of curiosity. It helps by
a) being faster
b) lending itself to powershell-script-based complex-query-filtering in powershell.
There is similar desire to query available apps in a bucket #4852.
Users want web-search for the same reason #4627
I'll admit it is a rare use case. Now that I made a txt-file, the next time I'll regenerate it is when I want to explore, but save time by not invoking scoop-info one at a time. When sufficient time has passed by and many new apps to have been added, the txt-file will have become obsolete.
What is slowing the given script down isn't scoop search, that fetches the names of all 2700 apps under 7 seconds. So it may not the size, number or file-read of the manifest files that is slowing it down, but the fact that a new powershell-process is being invoked for each app info query.
So scoop info * may take less time if it is a single powershell-process interpreted function-call. This claim ought to be tried and measured.
PS C:\WINDOWS\system32> Measure-Command -Expression {scoop search}
'extras' bucket:
'java' bucket:
'main' bucket:
Days : 0
Hours : 0
Minutes : 0
Seconds : 6
Milliseconds : 212
Ticks : 62129437
TotalDays : 7.19090706018518E-05
TotalHours : 0.00172581769444444
TotalMinutes : 0.103549061666667
TotalSeconds : 6.2129437
TotalMilliseconds : 6212.9437
PS C:\WINDOWS\system32> scoop search | Measure-Object
'extras' bucket:
'java' bucket:
'main' bucket:
Count : 2713
Average :
Sum :
Maximum :
Minimum :
Property :
ps.1. I was checking out https://rasa.github.io/scoop-directory/by-bucket#ScoopInstaller_Main . Awesome website.
What would be interesting is if the table for apps in a bucket (ScoopInstaller/Main) could also do sort-by-last-updated in addition to the presently-unchange-able sort-by-alphabetical. This would match with the scoop update output. Perhaps, also a sort-by-License. This way, one can get to know the apps in the order of of them being recently updated.
ps.2. Additionally maybe expand command line syntax to
- Expand
scoop infocommand line syntax- accept regexp and find all package app-names matching regexp
scoop info app-regexp - recognize optional bucket prefix:
scoop info <bucket-regexp/app-regexp>facilitating #4852. - accept multiple arguments
scoop info app1 <app2> <app3>
- accept regexp and find all package app-names matching regexp
- Alternatively, first improve
scoop searchto do regexp, bucket-prefix and multiple-arguments, as just mentioned previously. Then to avoid duplicating the search functionality ofscoop search, makescoop searchoutput an object stream. Then makescoop infoaccept object stream pipe in fromscoop search. This way, the object stream fromscoop searchcould then be piped intoscoop infoor taken as an argument, which would in turn would output correspondingscoop infoobjects . Being two powershell invocations, this could take 7*2=14 secs. ex:scoop search <bucket-regexp/app-regexp> | scoop info- or
scoop info -Input $(scoop search <bucket-regexp/app-regexp>)
The command scoop info xxx takes 7-8 seconds on a cold cache. Number of typical manifests a user will have is around 2500 (main+extras). Summing that up gives us ~4 hours. I still don't understand a use case for scoop info *, but we can see it is obviously impractical.
But I do like the idea of converting the output of scoop search into a PowerShell object(s). I've recently been refactoring the codebase to return PSCustomObjects wherever possible (already implemented for info and list), and search was next on my plan. I haven't started that yet (and I'm not sure when I will) but if you want to take a jab at this, please do! I myself want it to happen.
Definitely not 4 hrs. The unoptimized scriptlet given in the description took about 33 minutes. The gain is perhaps happen due to exe-caching, dir-list-caching or file-system caching. So script-time for single invocation script will be bounded between 7 sec and 33 minutes. If, I were to take a guess, it would be around 30 sec. (edit: guess turns out to be wrong, but that raised more questions, but 15 min is better than 33 min, ie. saving of 18 min)
This was my thinking: I have 3 buckets main, extras, java, so around 2700 app manifest. In the given scriptlet, in %{ scoop info $_.P2 }, the % is alias for ForEach-Object , and the expression inside the braces {,} powershell will evaluate to the output of a new command invocation (like bash, my assumption is, I don't think powershell treats it like an internal call). Unlike Linux, Windows is very bad at starting new processes/subshells. A single process reading 2700 manifest files, would be faster than 2700 processes each reading a single manifest file.
I experimented and attempted some measurements
- I first made a new file with list of apps, one app-name per line.
Get-Content C:\tmpq\Downloads\scoop_search_op.txt | ConvertFrom-String | % { $_.P1.ToString() } | ?{$_.Trim() -ne "" } > C:\tmpq\Downloads\scoop_app_list.txt - Then attempted a loop inside scoop-info.ps1 , loops $app over app-list ignoring given argument
PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> C:\vol\scoop_01\scoopg\apps\git\current\usr\bin\diff.exe .\scoop-info_orig.ps1 .\scoop-info.ps1 17c17,18 < --- > Get-Content -Path 'C:\tmpq\Downloads\scoop_app_list.txt' | % { > $app = "$_" 179c180 < --- > } - command works as expected, but took 15 minutes
PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> Measure-Command -Expression {scoop info 7zip} Days : 0 Hours : 0 Minutes : 14 Seconds : 47 Milliseconds : 981 Ticks : 8879816015 TotalDays : 0.0102775648321759 TotalHours : 0.246661555972222 TotalMinutes : 14.7996933583333 TotalSeconds : 887.9816015 TotalMilliseconds : 887981.6015 - I also observed in taskmgr that there seemed to be only two powershell.exe corresponding to two windows I had open. So it wasn't spawning multiple processes
- I ran the original scriptlet and also noticed that it wasn't spawning powershells in taskmgr. Original scriptlet takes 33 min. ~~The errors here are perhaps due to blank lines~~ [EDIT] This is fixed, this was due to single-char app-names typed as
Systemvalue.Charsee the p.s.PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> Measure-Command -Expression {scoop search | Select-String -Pattern "^ " | ConvertFrom-String | %{ scoop info $_.P2.ToString() } | %{ -join($_.Name, " : " , $_.Description ) }} 'extras' bucket: 'java' bucket: 'main' bucket: Days : 0 Hours : 0 Minutes : 32 Seconds : 49 Milliseconds : 330 Ticks : 19693305775 TotalDays : 0.0227931779803241 TotalHours : 0.547036271527778 TotalMinutes : 32.8221762916667 TotalSeconds : 1969.3305775 TotalMilliseconds : 1969330.5775 - perhaps powershell has a way to smartly avoid creating subprocess subshells, or taskmgr is not catching short-lived processes. resource-monitor shows that the powershell processes have about 14 threads each, but one that is running script has higher cpu utilization. The experiment did show that there is an invocation overhead, avoiding which, saved 18 minutes.
- I conclude scoop-info is too slow for what it is doing, and needs to be profiled. All its doing in my opinion is pulling up a manifest file and dumping out some properties. Like, why should printing small textual info from 2700 files take 15 minutes, only 3 files per second ? Perhaps inefficient search algorithms for fetching manifest/bucket or json parsing.
*EDIT: p.s. I edited some of the previous code-pastes to fix and remove the following error which appeared 4 times.
Method invocation failed because [System.Char] does not contain a method named 'startswith'.
At C:\vol\scoop_01\SCOOP\apps\scoop\current\lib\getopt.ps1:34 char:12
+ if($arg.startswith('--')) {
+ ~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : MethodNotFound
This happened because there are 4 apps with names z, q, r & v which are single letters . The ConvertFrom-String powershell command converts them to type SystemValue.Char . This caused scoop info to choke on the SystemValue.Char argument. So, I used a $_.P2.ToString() to force type conversion back to String.
The bulk of time taken by scoop info is due to the git subprocess (for fetching last author and date), and not due to powershell reading the manifest slowly.
I moved the git-using chunks of code that determine properties .'Updated at', .'Updated by' & .Installed to inside if ($verbose) { } guards so that they don't evaluate and sure enough ...
... Finishes in 37 seconds as expected.
:
:
Name : zstd
Description : High compression ratios compression algorithm
Version : 1.5.2
Bucket : main
Website : https://facebook.github.io/zstd
License : BSD-3-Clause
Binaries : zstd.exe
PS C:\vol\scoop_01\scoop\apps\scoop\current\libexec> Measure-command -Expression { scoop info 7zip }
Days : 0
Hours : 0
Minutes : 0
Seconds : 36
Milliseconds : 631
Ticks : 366311764
TotalDays : 0.000423971949074074
TotalHours : 0.0101753267777778
TotalMinutes : 0.610519606666667
TotalSeconds : 36.6311764
TotalMilliseconds : 36631.1764
Here-on-out, as to what to do (create an optarg/create a static json/etc), its in the realm of your decision space.