YouTube-operational-API
YouTube-operational-API copied to clipboard
Add `executable = '/usr/bin/bash'` to `subprocess.check_output` may be necessary to support `curl` `--data-raw $'...'`
https://github.com/Benjamin-Loison/YouTube-operational-API/blob/d61488fbe0becf6d2a6ebc97761e5b87a8facd3f/tools/minimizeCURL.py#L43
See the Unix Stack Exchange answer 115614.
http://wiki.bash-hackers.org/syntax/quoting is a quite empty page (even from source point of view and https:// does not help).
https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html
+2
It seems that currently the algorithm rebuilds the command as --data-raw '$...' which is an issue in the context of Benjamin_Loison/OneDrive/issues/6.
Also before blob/main/tools/minimizeCURL.py#L151-L228 have to be considered as it rebuilds incorrectly the command.
A shameful, probably introducing a security flaw, fix is:
command = command.replace(" --data-raw '$", " --data-raw $'")
But to put where? I guess in isCommandStillFine.
Related to #171.
Alternatively can replace --data-raw $'...' with --data-raw "..." it seems.
Maybe just '...' does not interpret some characters like \r and \n. It seems that "..." too even if escape them and ".
Even escaping \, does not make "..." work for:
URL:
-----BEGIN PGP MESSAGE-----
hF4DTQa9Wom5MBgSAQdA1OEw/bqOs0qI8sDf/mCyaHXumnfef2o9xpB9zMvZKHIw
adXxpGpfchCvld/9+2gr9w+T2mvKcv3IRt6sJEOPSC4lsnxIDxKXEByRu5jBn+FP
0qQBWg3M5tUM1m1LT7G8SW+x7nG5Rl0ksfRfzUoQXY/MShLuoOTheSR3Nw33217Y
FOVIbAybZ8uY5dPJVka+aOZ0LNSw4i6QVEn5rmbju3qANxE5LTw0146HzGjaaVCz
89RkG7i3Fum9FfYw/AaBPYSekj8RvDXJP4lmXnQcudKtp8pGIvfvstytxLHvU7Gd
84qi3eiVWyqoiw1oy7ghg4/+3n5Tag==
=H5Cv
-----END PGP MESSAGE-----
So I exceptionally minimized by hand.
echo $'-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX\r\nContent-Disposition: form-data; name="no_individu"\r\n\r\nXXXXXX\r\n-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX\r\nContent-Disposition: form-data; name="acti"\r\n\r\nXX\r\n-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX--\r\n'
Output:
-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Disposition: form-data; name="no_individu"
XXXXXX
-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Disposition: form-data; name="acti"
XX
-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX--
I am unable to reproduce these new lines without $.
Otherwise maybe could rely on a command line converter.
The Unix Stack Exchange answer 48122 may help as well as its comments.
Related to Webscrap_any_website/issues/29.
import shlex
command = "curl --data-raw $'\''"
# command = "curl --data-raw $'a'"
# works fine
shlex.split(command)
ValueError: No closing quotation:
Traceback (most recent call last):
File "<tmp 6>", line 6, in <module>
shlex.split(command)
File "/usr/lib/python3.12/shlex.py", line 313, in split
return list(lex)
File "/usr/lib/python3.12/shlex.py", line 300, in __next__
token = self.get_token()
File "/usr/lib/python3.12/shlex.py", line 109, in get_token
raw = self.read_token()
File "/usr/lib/python3.12/shlex.py", line 191, in read_token
raise ValueError("No closing quotation")
ValueError: No closing quotation
help(shlex.split)
Output:
Help on function split in module shlex:
split(s, comments=False, posix=True)
Split the string *s* using shell-like syntax.
https://docs.python.org/3.12/library/shlex.html#shlex.split
Removing $ does not help.
import shlex
command = "curl --data-raw $'\''"
print(shlex.split(command, posix = False))
['curl', '--data-raw', "$'''"]
help(shlex.join)
Output:
Help on function join in module shlex:
join(split_command)
Return a shell-escaped string from *split_command*.
https://docs.python.org/3.12/library/shlex.html#shlex.join
import shlex
command = "curl --data-raw $'\''"
commandSplitted = shlex.split(command, posix = False)
print(shlex.join(commandSplitted))
curl --data-raw '$'"'"''"'"''"'"''
print(' '.join(commandSplitted))
curl --data-raw $'''
command = "curl --data-raw $'\''"
print(command)
curl --data-raw $'''
Python script:
import shlex
command = "curl --data-raw $'\\''"
# Equivalent to above `command`.
with open('curl.sh') as f:
command = f.read()
print(command)
commandSplitted = shlex.split(command, posix = False)
print(shlex.join(commandSplitted))
print(' '.join(commandSplitted))
Output:
curl --data-raw $'\''
curl --data-raw '$'"'"'\'"'"''"'"''
curl --data-raw $'\''
Using ' '.join(...) requires to manage quoting arguments on our own it seems.
Diff:
diff --git a/tools/minimizeCURL.py b/tools/minimizeCURL.py
index 2b5a721..65dac18 100755
--- a/tools/minimizeCURL.py
+++ b/tools/minimizeCURL.py
@@ -31,7 +31,7 @@ wantedOutput = sys.argv[2].encode('utf-8')
removeHeaders = True
removeUrlParameters = True
removeCookies = True
-removeRawData = True
+removeRawData = False
# Pay attention to provide a command giving plaintext output, so might required to remove `Accept-Encoding` HTTPS header.
with open(curlCommandFilePath) as f:
@@ -40,7 +40,7 @@ with open(curlCommandFilePath) as f:
def executeCommand(command):
# `stderr = subprocess.DEVNULL` is used to get rid of curl progress.
# Could also add `-s` curl argument.
- result = subprocess.check_output(command, shell = True, stderr = subprocess.DEVNULL)
+ result = subprocess.check_output(command, shell = True, stderr = subprocess.DEVNULL, executable = '/usr/bin/bash')
return result
def isCommandStillFine(command):
@@ -62,41 +62,50 @@ if not isCommandStillFine(command):
print('The wanted output isn\'t contained in the result of the original curl command!')
exit(1)
+def splitCommand(command):
+ return shlex.split(command, posix = False)
+
+def joinSplittedCommand(spittedCommand):
+ return ' '.join(spittedCommand)
+ return shlex.join(spittedCommand)
+
if removeHeaders:
print('Removing headers')
# Should try to minimize the number of requests done, by testing half of parameters at each request.
while True:
changedSomething = False
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex in range(len(arguments) - 1):
argument, nextArgument = arguments[argumentsIndex : argumentsIndex + 2]
if argument == '-H':
previousCommand = command
del arguments[argumentsIndex : argumentsIndex + 2]
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
break
else:
command = previousCommand
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
if not changedSomething:
break
if removeUrlParameters:
print('Removing URL parameters')
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex, argument in enumerate(arguments):
- if argument.startswith('http'):
+ if argument.startswith("'http"):
urlIndex = argumentsIndex
+ #arguments[urlIndex] = arguments[urlIndex][1:-1]
break
url = arguments[urlIndex]
while True:
changedSomething = False
+ url = url[1:-1]
urlParsed = urlparse(url)
query = parse_qs(urlParsed.query)
for key in list(query):
@@ -104,8 +113,8 @@ if removeUrlParameters:
del query[key]
# Make a function with below code.
url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
- arguments[urlIndex] = url
- command = shlex.join(arguments)
+ arguments[urlIndex] = shlex.quote(url)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
@@ -113,8 +122,8 @@ if removeUrlParameters:
else:
query = previousQuery
url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
- arguments[urlIndex] = url
- command = shlex.join(arguments)
+ arguments[urlIndex] = shlex.quote(url)
+ command = joinSplittedCommand(arguments)
if not changedSomething:
break
@@ -125,7 +134,7 @@ if removeCookies:
COOKIES_PREFIX_LEN = len(COOKIES_PREFIX)
cookiesIndex = None
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex, argument in enumerate(arguments):
# For Chromium support:
if argument[:COOKIES_PREFIX_LEN].title() == COOKIES_PREFIX:
@@ -142,7 +151,7 @@ if removeCookies:
cookiesParsedCopy = cookiesParsed[:]
del cookiesParsedCopy[cookiesParsedIndex]
arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsedCopy)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
@@ -150,7 +159,7 @@ if removeCookies:
break
else:
arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsed)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if not changedSomething:
break
@@ -159,7 +168,7 @@ if removeRawData:
rawDataIndex = None
isJson = False
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex, argument in enumerate(arguments):
if argumentsIndex > 0 and arguments[argumentsIndex - 1] == '--data-raw':
rawDataIndex = argumentsIndex
@@ -182,7 +191,7 @@ if removeRawData:
rawDataPartsCopy = copy.deepcopy(rawDataParts)
del rawDataPartsCopy[rawDataPartsIndex]
arguments[rawDataIndex] = '&'.join(rawDataPartsCopy)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
@@ -190,7 +199,7 @@ if removeRawData:
break
else:
arguments[rawDataIndex] = '&'.join(rawDataParts)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if not changedSomething:
break
# JSON recursive case.
@@ -229,7 +238,7 @@ if removeRawData:
del entry[lastPathPart]
# Test if the removed entry was necessary.
arguments[rawDataIndex] = json.dumps(rawDataParsedCopy)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
# (1) If it was unnecessary, then reconsider paths excluding possible children paths of this unnecessary entry, ensuring optimized complexity it seems.
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
@@ -239,7 +248,7 @@ if removeRawData:
# If it was necessary, we consider possible children paths of this necessary entry and other paths.
else:
arguments[rawDataIndex] = json.dumps(rawDataParsed)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
# If a loop iteration considering all paths, does not change anything, then the request cannot be minimized further.
if not changedSomething:
break
For big file modified above diff with:
Output:
diff --git a/tools/minimizeCURL.py b/tools/minimizeCURL.py
index 2b5a721..6e281ea 100755
--- a/tools/minimizeCURL.py
+++ b/tools/minimizeCURL.py
@@ -31,7 +31,7 @@ wantedOutput = sys.argv[2].encode('utf-8')
removeHeaders = True
removeUrlParameters = True
removeCookies = True
-removeRawData = True
+removeRawData = False
# Pay attention to provide a command giving plaintext output, so might required to remove `Accept-Encoding` HTTPS header.
with open(curlCommandFilePath) as f:
@@ -40,7 +40,12 @@ with open(curlCommandFilePath) as f:
def executeCommand(command):
# `stderr = subprocess.DEVNULL` is used to get rid of curl progress.
# Could also add `-s` curl argument.
- result = subprocess.check_output(command, shell = True, stderr = subprocess.DEVNULL)
+ INTERMEDIARY_CURL_FILE_PATH = 'intermediary_curl.sh'
+ with open(INTERMEDIARY_CURL_FILE_PATH, 'w') as f:
+ f.write(command)
+ result = subprocess.check_output(f'bash {INTERMEDIARY_CURL_FILE_PATH}', shell = True, stderr = subprocess.DEVNULL)
+ #print(result)
+ #exit(1)
return result
def isCommandStillFine(command):
@@ -56,6 +61,13 @@ def printThatCommandIsStillFine(command):
# For Chromium support:
command = command.replace(' \\\n ', '')
+def splitCommand(command):
+ return shlex.split(command, posix = False)
+
+def joinSplittedCommand(spittedCommand):
+ return ' '.join(spittedCommand)
+ return shlex.join(spittedCommand)
+
print(f'Initial command length: {getCommandLengthFormatted(command)}.')
# To verify that the user provided the correct `wantedOutput` to keep during the minimization.
if not isCommandStillFine(command):
@@ -68,35 +80,37 @@ if removeHeaders:
# Should try to minimize the number of requests done, by testing half of parameters at each request.
while True:
changedSomething = False
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex in range(len(arguments) - 1):
argument, nextArgument = arguments[argumentsIndex : argumentsIndex + 2]
if argument == '-H':
previousCommand = command
del arguments[argumentsIndex : argumentsIndex + 2]
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
break
else:
command = previousCommand
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
if not changedSomething:
break
if removeUrlParameters:
print('Removing URL parameters')
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex, argument in enumerate(arguments):
- if argument.startswith('http'):
+ if argument.startswith("'http"):
urlIndex = argumentsIndex
+ #arguments[urlIndex] = arguments[urlIndex][1:-1]
break
url = arguments[urlIndex]
while True:
changedSomething = False
+ url = url[1:-1]
urlParsed = urlparse(url)
query = parse_qs(urlParsed.query)
for key in list(query):
@@ -104,8 +118,8 @@ if removeUrlParameters:
del query[key]
# Make a function with below code.
url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
- arguments[urlIndex] = url
- command = shlex.join(arguments)
+ arguments[urlIndex] = shlex.quote(url)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
@@ -113,8 +127,8 @@ if removeUrlParameters:
else:
query = previousQuery
url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
- arguments[urlIndex] = url
- command = shlex.join(arguments)
+ arguments[urlIndex] = shlex.quote(url)
+ command = joinSplittedCommand(arguments)
if not changedSomething:
break
@@ -125,7 +139,7 @@ if removeCookies:
COOKIES_PREFIX_LEN = len(COOKIES_PREFIX)
cookiesIndex = None
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex, argument in enumerate(arguments):
# For Chromium support:
if argument[:COOKIES_PREFIX_LEN].title() == COOKIES_PREFIX:
@@ -142,7 +156,7 @@ if removeCookies:
cookiesParsedCopy = cookiesParsed[:]
del cookiesParsedCopy[cookiesParsedIndex]
arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsedCopy)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
@@ -150,7 +164,7 @@ if removeCookies:
break
else:
arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsed)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if not changedSomething:
break
@@ -159,7 +173,7 @@ if removeRawData:
rawDataIndex = None
isJson = False
- arguments = shlex.split(command)
+ arguments = splitCommand(command)
for argumentsIndex, argument in enumerate(arguments):
if argumentsIndex > 0 and arguments[argumentsIndex - 1] == '--data-raw':
rawDataIndex = argumentsIndex
@@ -182,7 +196,7 @@ if removeRawData:
rawDataPartsCopy = copy.deepcopy(rawDataParts)
del rawDataPartsCopy[rawDataPartsIndex]
arguments[rawDataIndex] = '&'.join(rawDataPartsCopy)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
changedSomething = True
@@ -190,7 +204,7 @@ if removeRawData:
break
else:
arguments[rawDataIndex] = '&'.join(rawDataParts)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
if not changedSomething:
break
# JSON recursive case.
@@ -229,7 +243,7 @@ if removeRawData:
del entry[lastPathPart]
# Test if the removed entry was necessary.
arguments[rawDataIndex] = json.dumps(rawDataParsedCopy)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
# (1) If it was unnecessary, then reconsider paths excluding possible children paths of this unnecessary entry, ensuring optimized complexity it seems.
if isCommandStillFine(command):
printThatCommandIsStillFine(command)
@@ -239,7 +253,7 @@ if removeRawData:
# If it was necessary, we consider possible children paths of this necessary entry and other paths.
else:
arguments[rawDataIndex] = json.dumps(rawDataParsed)
- command = shlex.join(arguments)
+ command = joinSplittedCommand(arguments)
# If a loop iteration considering all paths, does not change anything, then the request cannot be minimized further.
if not changedSomething:
break
Copy as {PowerShell,Fetch} do not seem to help.
On Windows:
Copy as cURL (Windows) does not use $'. But does not work as wanted on Linux according to echo to a file and diff with Copy as cURL (POSIX).
Copy as cURL (POSIX) uses $'.
What about Chromium on Linux and Windows? See Online_authentication_API/issues/78#issuecomment-2405882.
DuckDuckGo and Google search Command POSIX to Windows converter do not return relevant results it seems.
Note that echo $"a\nb" does not work while echo $'a\nb' does.
Maybe latest Python version of shlex supports this feature.
In fact splitting echo $'a\nb' and echo 'a\nb' should return something like ['echo', 'a\nb']. It is unclear how should keep the $'...' information.
Can maybe somehow not need $'...' by expand newlines (but does not seem to apply to other usages of $'...') or use a given file for --data-raw it is possible as far as I remember.
COMMAND = "echo $'a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND))
echo $'a\nb'
['echo', '$a\\nb']
COMMAND = "echo '$a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND))
echo '$a\nb'
['echo', '$a\\nb']
echo '$a\nb'
$a\nb
COMMAND = "echo $'a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND, posix = False))
echo $'a\nb'
['echo', "$'a\\nb'"]
COMMAND = "echo '$a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND, posix = False))
echo '$a\nb'
['echo', "'$a\\nb'"]
In the case of Online authentication API, this issue may be due to my complex passwords, using, if possible, passwords without special characters may help. Otherwise use special character that do not seem to require some kind of escape as $'...' seems to provide.
For reference: dollar.