YouTube-operational-API icon indicating copy to clipboard operation
YouTube-operational-API copied to clipboard

Add `executable = '/usr/bin/bash'` to `subprocess.check_output` may be necessary to support `curl` `--data-raw $'...'`

Open Benjamin-Loison opened this issue 1 year ago • 27 comments

https://github.com/Benjamin-Loison/YouTube-operational-API/blob/d61488fbe0becf6d2a6ebc97761e5b87a8facd3f/tools/minimizeCURL.py#L43

See the Unix Stack Exchange answer 115614.

http://wiki.bash-hackers.org/syntax/quoting is a quite empty page (even from source point of view and https:// does not help). https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html

+2

Benjamin-Loison avatar Aug 02 '24 12:08 Benjamin-Loison

It seems that currently the algorithm rebuilds the command as --data-raw '$...' which is an issue in the context of Benjamin_Loison/OneDrive/issues/6.

Benjamin-Loison avatar Aug 02 '24 14:08 Benjamin-Loison

Also before blob/main/tools/minimizeCURL.py#L151-L228 have to be considered as it rebuilds incorrectly the command.

Benjamin-Loison avatar Aug 02 '24 15:08 Benjamin-Loison

A shameful, probably introducing a security flaw, fix is:

command = command.replace(" --data-raw '$", " --data-raw $'")

But to put where? I guess in isCommandStillFine.

Benjamin-Loison avatar Aug 02 '24 15:08 Benjamin-Loison

Related to #171.

Benjamin-Loison avatar Sep 10 '24 16:09 Benjamin-Loison

Alternatively can replace --data-raw $'...' with --data-raw "..." it seems.

Maybe just '...' does not interpret some characters like \r and \n. It seems that "..." too even if escape them and ".

Even escaping \, does not make "..." work for:

URL:
-----BEGIN PGP MESSAGE-----

hF4DTQa9Wom5MBgSAQdA1OEw/bqOs0qI8sDf/mCyaHXumnfef2o9xpB9zMvZKHIw
adXxpGpfchCvld/9+2gr9w+T2mvKcv3IRt6sJEOPSC4lsnxIDxKXEByRu5jBn+FP
0qQBWg3M5tUM1m1LT7G8SW+x7nG5Rl0ksfRfzUoQXY/MShLuoOTheSR3Nw33217Y
FOVIbAybZ8uY5dPJVka+aOZ0LNSw4i6QVEn5rmbju3qANxE5LTw0146HzGjaaVCz
89RkG7i3Fum9FfYw/AaBPYSekj8RvDXJP4lmXnQcudKtp8pGIvfvstytxLHvU7Gd
84qi3eiVWyqoiw1oy7ghg4/+3n5Tag==
=H5Cv
-----END PGP MESSAGE-----

So I exceptionally minimized by hand.

Benjamin-Loison avatar Sep 10 '24 16:09 Benjamin-Loison

echo $'-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX\r\nContent-Disposition: form-data; name="no_individu"\r\n\r\nXXXXXX\r\n-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX\r\nContent-Disposition: form-data; name="acti"\r\n\r\nXX\r\n-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX--\r\n'
Output:
-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Disposition: form-data; name="no_individu"

XXXXXX
-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Disposition: form-data; name="acti"

XX
-----------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXX--

I am unable to reproduce these new lines without $.

Otherwise maybe could rely on a command line converter.

The Unix Stack Exchange answer 48122 may help as well as its comments.

Benjamin-Loison avatar Sep 16 '24 14:09 Benjamin-Loison

import shlex

command = "curl --data-raw $'\''"
# command = "curl --data-raw $'a'"
# works fine

shlex.split(command)
ValueError: No closing quotation:
Traceback (most recent call last):
  File "<tmp 6>", line 6, in <module>
    shlex.split(command)
  File "/usr/lib/python3.12/shlex.py", line 313, in split
    return list(lex)
  File "/usr/lib/python3.12/shlex.py", line 300, in __next__
    token = self.get_token()
  File "/usr/lib/python3.12/shlex.py", line 109, in get_token
    raw = self.read_token()
  File "/usr/lib/python3.12/shlex.py", line 191, in read_token
    raise ValueError("No closing quotation")
ValueError: No closing quotation

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

help(shlex.split)
Output:
Help on function split in module shlex:

split(s, comments=False, posix=True)
    Split the string *s* using shell-like syntax.

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

https://docs.python.org/3.12/library/shlex.html#shlex.split

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

Removing $ does not help.

import shlex

command = "curl --data-raw $'\''"

print(shlex.split(command, posix = False))
['curl', '--data-raw', "$'''"]

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

help(shlex.join)
Output:
Help on function join in module shlex:

join(split_command)
    Return a shell-escaped string from *split_command*.

https://docs.python.org/3.12/library/shlex.html#shlex.join

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

import shlex

command = "curl --data-raw $'\''"

commandSplitted = shlex.split(command, posix = False)
print(shlex.join(commandSplitted))
curl --data-raw '$'"'"''"'"''"'"''

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

print(' '.join(commandSplitted))
curl --data-raw $'''

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

command = "curl --data-raw $'\''"
print(command)
curl --data-raw $'''

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

Python script:
import shlex

command = "curl --data-raw $'\\''"
# Equivalent to above `command`.
with open('curl.sh') as f:
    command = f.read()
print(command)

commandSplitted = shlex.split(command, posix = False)
print(shlex.join(commandSplitted))
print(' '.join(commandSplitted))
Output:
curl --data-raw $'\''

curl --data-raw '$'"'"'\'"'"''"'"''
curl --data-raw $'\''

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

Using ' '.join(...) requires to manage quoting arguments on our own it seems.

Benjamin-Loison avatar Sep 23 '24 17:09 Benjamin-Loison

Diff:
diff --git a/tools/minimizeCURL.py b/tools/minimizeCURL.py
index 2b5a721..65dac18 100755
--- a/tools/minimizeCURL.py
+++ b/tools/minimizeCURL.py
@@ -31,7 +31,7 @@ wantedOutput = sys.argv[2].encode('utf-8')
 removeHeaders = True
 removeUrlParameters = True
 removeCookies = True
-removeRawData = True
+removeRawData = False
 
 # Pay attention to provide a command giving plaintext output, so might required to remove `Accept-Encoding` HTTPS header.
 with open(curlCommandFilePath) as f:
@@ -40,7 +40,7 @@ with open(curlCommandFilePath) as f:
 def executeCommand(command):
     # `stderr = subprocess.DEVNULL` is used to get rid of curl progress.
     # Could also add `-s` curl argument.
-    result = subprocess.check_output(command, shell = True, stderr = subprocess.DEVNULL)
+    result = subprocess.check_output(command, shell = True, stderr = subprocess.DEVNULL, executable = '/usr/bin/bash')
     return result
 
 def isCommandStillFine(command):
@@ -62,41 +62,50 @@ if not isCommandStillFine(command):
     print('The wanted output isn\'t contained in the result of the original curl command!')
     exit(1)
 
+def splitCommand(command):
+    return shlex.split(command, posix = False)
+
+def joinSplittedCommand(spittedCommand):
+    return ' '.join(spittedCommand)
+    return shlex.join(spittedCommand)
+
 if removeHeaders:
     print('Removing headers')
 
     # Should try to minimize the number of requests done, by testing half of parameters at each request.
     while True:
         changedSomething = False
-        arguments = shlex.split(command)
+        arguments = splitCommand(command)
         for argumentsIndex in range(len(arguments) - 1):
             argument, nextArgument = arguments[argumentsIndex : argumentsIndex + 2]
             if argument == '-H':
                 previousCommand = command
                 del arguments[argumentsIndex : argumentsIndex + 2]
-                command = shlex.join(arguments)
+                command = joinSplittedCommand(arguments)
                 if isCommandStillFine(command):
                     printThatCommandIsStillFine(command)
                     changedSomething = True
                     break
                 else:
                     command = previousCommand
-                    arguments = shlex.split(command)
+                    arguments = splitCommand(command)
         if not changedSomething:
             break
 
 if removeUrlParameters:
     print('Removing URL parameters')
 
-    arguments = shlex.split(command)
+    arguments = splitCommand(command)
     for argumentsIndex, argument in enumerate(arguments):
-        if argument.startswith('http'):
+        if argument.startswith("'http"):
             urlIndex = argumentsIndex
+            #arguments[urlIndex] = arguments[urlIndex][1:-1]
             break
 
     url = arguments[urlIndex]
     while True:
         changedSomething = False
+        url = url[1:-1]
         urlParsed = urlparse(url)
         query = parse_qs(urlParsed.query)
         for key in list(query):
@@ -104,8 +113,8 @@ if removeUrlParameters:
             del query[key]
             # Make a function with below code.
             url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
-            arguments[urlIndex] = url
-            command = shlex.join(arguments)
+            arguments[urlIndex] = shlex.quote(url)
+            command = joinSplittedCommand(arguments)
             if isCommandStillFine(command):
                 printThatCommandIsStillFine(command)
                 changedSomething = True
@@ -113,8 +122,8 @@ if removeUrlParameters:
             else:
                 query = previousQuery
                 url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
-                arguments[urlIndex] = url
-                command = shlex.join(arguments)
+                arguments[urlIndex] = shlex.quote(url)
+                command = joinSplittedCommand(arguments)
         if not changedSomething:
             break
 
@@ -125,7 +134,7 @@ if removeCookies:
     COOKIES_PREFIX_LEN = len(COOKIES_PREFIX)
 
     cookiesIndex = None
-    arguments = shlex.split(command)
+    arguments = splitCommand(command)
     for argumentsIndex, argument in enumerate(arguments):
         # For Chromium support:
         if argument[:COOKIES_PREFIX_LEN].title() == COOKIES_PREFIX:
@@ -142,7 +151,7 @@ if removeCookies:
                 cookiesParsedCopy = cookiesParsed[:]
                 del cookiesParsedCopy[cookiesParsedIndex]
                 arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsedCopy)
-                command = shlex.join(arguments)
+                command = joinSplittedCommand(arguments)
                 if isCommandStillFine(command):
                     printThatCommandIsStillFine(command)
                     changedSomething = True
@@ -150,7 +159,7 @@ if removeCookies:
                     break
                 else:
                     arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsed)
-                    command = shlex.join(arguments)
+                    command = joinSplittedCommand(arguments)
             if not changedSomething:
                 break
 
@@ -159,7 +168,7 @@ if removeRawData:
 
     rawDataIndex = None
     isJson = False
-    arguments = shlex.split(command)
+    arguments = splitCommand(command)
     for argumentsIndex, argument in enumerate(arguments):
         if argumentsIndex > 0 and arguments[argumentsIndex - 1] == '--data-raw':
             rawDataIndex = argumentsIndex
@@ -182,7 +191,7 @@ if removeRawData:
                     rawDataPartsCopy = copy.deepcopy(rawDataParts)
                     del rawDataPartsCopy[rawDataPartsIndex]
                     arguments[rawDataIndex] = '&'.join(rawDataPartsCopy)
-                    command = shlex.join(arguments)
+                    command = joinSplittedCommand(arguments)
                     if isCommandStillFine(command):
                         printThatCommandIsStillFine(command)
                         changedSomething = True
@@ -190,7 +199,7 @@ if removeRawData:
                         break
                     else:
                         arguments[rawDataIndex] = '&'.join(rawDataParts)
-                        command = shlex.join(arguments)
+                        command = joinSplittedCommand(arguments)
                 if not changedSomething:
                     break
         # JSON recursive case.
@@ -229,7 +238,7 @@ if removeRawData:
                     del entry[lastPathPart]
                     # Test if the removed entry was necessary.
                     arguments[rawDataIndex] = json.dumps(rawDataParsedCopy)
-                    command = shlex.join(arguments)
+                    command = joinSplittedCommand(arguments)
                     # (1) If it was unnecessary, then reconsider paths excluding possible children paths of this unnecessary entry, ensuring optimized complexity it seems.
                     if isCommandStillFine(command):
                         printThatCommandIsStillFine(command)
@@ -239,7 +248,7 @@ if removeRawData:
                     # If it was necessary, we consider possible children paths of this necessary entry and other paths.
                     else:
                         arguments[rawDataIndex] = json.dumps(rawDataParsed)
-                        command = shlex.join(arguments)
+                        command = joinSplittedCommand(arguments)
                 # If a loop iteration considering all paths, does not change anything, then the request cannot be minimized further.
                 if not changedSomething:
                     break

Benjamin-Loison avatar Sep 23 '24 19:09 Benjamin-Loison

For big file modified above diff with:

Output:
diff --git a/tools/minimizeCURL.py b/tools/minimizeCURL.py
index 2b5a721..6e281ea 100755
--- a/tools/minimizeCURL.py
+++ b/tools/minimizeCURL.py
@@ -31,7 +31,7 @@ wantedOutput = sys.argv[2].encode('utf-8')
 removeHeaders = True
 removeUrlParameters = True
 removeCookies = True
-removeRawData = True
+removeRawData = False
 
 # Pay attention to provide a command giving plaintext output, so might required to remove `Accept-Encoding` HTTPS header.
 with open(curlCommandFilePath) as f:
@@ -40,7 +40,12 @@ with open(curlCommandFilePath) as f:
 def executeCommand(command):
     # `stderr = subprocess.DEVNULL` is used to get rid of curl progress.
     # Could also add `-s` curl argument.
-    result = subprocess.check_output(command, shell = True, stderr = subprocess.DEVNULL)
+    INTERMEDIARY_CURL_FILE_PATH = 'intermediary_curl.sh'
+    with open(INTERMEDIARY_CURL_FILE_PATH, 'w') as f:
+        f.write(command)
+    result = subprocess.check_output(f'bash {INTERMEDIARY_CURL_FILE_PATH}', shell = True, stderr = subprocess.DEVNULL)
+    #print(result)
+    #exit(1)
     return result
 
 def isCommandStillFine(command):
@@ -56,6 +61,13 @@ def printThatCommandIsStillFine(command):
 # For Chromium support:
 command = command.replace(' \\\n ', '')
 
+def splitCommand(command):
+    return shlex.split(command, posix = False)
+
+def joinSplittedCommand(spittedCommand):
+    return ' '.join(spittedCommand)
+    return shlex.join(spittedCommand)
+
 print(f'Initial command length: {getCommandLengthFormatted(command)}.')
 # To verify that the user provided the correct `wantedOutput` to keep during the minimization.
 if not isCommandStillFine(command):
@@ -68,35 +80,37 @@ if removeHeaders:
     # Should try to minimize the number of requests done, by testing half of parameters at each request.
     while True:
         changedSomething = False
-        arguments = shlex.split(command)
+        arguments = splitCommand(command)
         for argumentsIndex in range(len(arguments) - 1):
             argument, nextArgument = arguments[argumentsIndex : argumentsIndex + 2]
             if argument == '-H':
                 previousCommand = command
                 del arguments[argumentsIndex : argumentsIndex + 2]
-                command = shlex.join(arguments)
+                command = joinSplittedCommand(arguments)
                 if isCommandStillFine(command):
                     printThatCommandIsStillFine(command)
                     changedSomething = True
                     break
                 else:
                     command = previousCommand
-                    arguments = shlex.split(command)
+                    arguments = splitCommand(command)
         if not changedSomething:
             break
 
 if removeUrlParameters:
     print('Removing URL parameters')
 
-    arguments = shlex.split(command)
+    arguments = splitCommand(command)
     for argumentsIndex, argument in enumerate(arguments):
-        if argument.startswith('http'):
+        if argument.startswith("'http"):
             urlIndex = argumentsIndex
+            #arguments[urlIndex] = arguments[urlIndex][1:-1]
             break
 
     url = arguments[urlIndex]
     while True:
         changedSomething = False
+        url = url[1:-1]
         urlParsed = urlparse(url)
         query = parse_qs(urlParsed.query)
         for key in list(query):
@@ -104,8 +118,8 @@ if removeUrlParameters:
             del query[key]
             # Make a function with below code.
             url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
-            arguments[urlIndex] = url
-            command = shlex.join(arguments)
+            arguments[urlIndex] = shlex.quote(url)
+            command = joinSplittedCommand(arguments)
             if isCommandStillFine(command):
                 printThatCommandIsStillFine(command)
                 changedSomething = True
@@ -113,8 +127,8 @@ if removeUrlParameters:
             else:
                 query = previousQuery
                 url = urlParsed._replace(query = '&'.join([f'{quote_plus(parameter)}={quote_plus(query[parameter][0])}' for parameter in query])).geturl()
-                arguments[urlIndex] = url
-                command = shlex.join(arguments)
+                arguments[urlIndex] = shlex.quote(url)
+                command = joinSplittedCommand(arguments)
         if not changedSomething:
             break
 
@@ -125,7 +139,7 @@ if removeCookies:
     COOKIES_PREFIX_LEN = len(COOKIES_PREFIX)
 
     cookiesIndex = None
-    arguments = shlex.split(command)
+    arguments = splitCommand(command)
     for argumentsIndex, argument in enumerate(arguments):
         # For Chromium support:
         if argument[:COOKIES_PREFIX_LEN].title() == COOKIES_PREFIX:
@@ -142,7 +156,7 @@ if removeCookies:
                 cookiesParsedCopy = cookiesParsed[:]
                 del cookiesParsedCopy[cookiesParsedIndex]
                 arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsedCopy)
-                command = shlex.join(arguments)
+                command = joinSplittedCommand(arguments)
                 if isCommandStillFine(command):
                     printThatCommandIsStillFine(command)
                     changedSomething = True
@@ -150,7 +164,7 @@ if removeCookies:
                     break
                 else:
                     arguments[cookiesIndex] = COOKIES_PREFIX + '; '.join(cookiesParsed)
-                    command = shlex.join(arguments)
+                    command = joinSplittedCommand(arguments)
             if not changedSomething:
                 break
 
@@ -159,7 +173,7 @@ if removeRawData:
 
     rawDataIndex = None
     isJson = False
-    arguments = shlex.split(command)
+    arguments = splitCommand(command)
     for argumentsIndex, argument in enumerate(arguments):
         if argumentsIndex > 0 and arguments[argumentsIndex - 1] == '--data-raw':
             rawDataIndex = argumentsIndex
@@ -182,7 +196,7 @@ if removeRawData:
                     rawDataPartsCopy = copy.deepcopy(rawDataParts)
                     del rawDataPartsCopy[rawDataPartsIndex]
                     arguments[rawDataIndex] = '&'.join(rawDataPartsCopy)
-                    command = shlex.join(arguments)
+                    command = joinSplittedCommand(arguments)
                     if isCommandStillFine(command):
                         printThatCommandIsStillFine(command)
                         changedSomething = True
@@ -190,7 +204,7 @@ if removeRawData:
                         break
                     else:
                         arguments[rawDataIndex] = '&'.join(rawDataParts)
-                        command = shlex.join(arguments)
+                        command = joinSplittedCommand(arguments)
                 if not changedSomething:
                     break
         # JSON recursive case.
@@ -229,7 +243,7 @@ if removeRawData:
                     del entry[lastPathPart]
                     # Test if the removed entry was necessary.
                     arguments[rawDataIndex] = json.dumps(rawDataParsedCopy)
-                    command = shlex.join(arguments)
+                    command = joinSplittedCommand(arguments)
                     # (1) If it was unnecessary, then reconsider paths excluding possible children paths of this unnecessary entry, ensuring optimized complexity it seems.
                     if isCommandStillFine(command):
                         printThatCommandIsStillFine(command)
@@ -239,7 +253,7 @@ if removeRawData:
                     # If it was necessary, we consider possible children paths of this necessary entry and other paths.
                     else:
                         arguments[rawDataIndex] = json.dumps(rawDataParsed)
-                        command = shlex.join(arguments)
+                        command = joinSplittedCommand(arguments)
                 # If a loop iteration considering all paths, does not change anything, then the request cannot be minimized further.
                 if not changedSomething:
                     break

Benjamin-Loison avatar Sep 30 '24 17:09 Benjamin-Loison

Copy as {PowerShell,Fetch} do not seem to help.

On Windows:

Copy as cURL (Windows) does not use $'. But does not work as wanted on Linux according to echo to a file and diff with Copy as cURL (POSIX). Copy as cURL (POSIX) uses $'.

What about Chromium on Linux and Windows? See Online_authentication_API/issues/78#issuecomment-2405882.

Benjamin-Loison avatar Oct 28 '24 10:10 Benjamin-Loison

DuckDuckGo and Google search Command POSIX to Windows converter do not return relevant results it seems.

Benjamin-Loison avatar Oct 28 '24 12:10 Benjamin-Loison

Note that echo $"a\nb" does not work while echo $'a\nb' does.

Benjamin-Loison avatar Oct 29 '24 16:10 Benjamin-Loison

Maybe latest Python version of shlex supports this feature.

In fact splitting echo $'a\nb' and echo 'a\nb' should return something like ['echo', 'a\nb']. It is unclear how should keep the $'...' information.

Benjamin-Loison avatar Oct 29 '24 16:10 Benjamin-Loison

Can maybe somehow not need $'...' by expand newlines (but does not seem to apply to other usages of $'...') or use a given file for --data-raw it is possible as far as I remember.

Benjamin-Loison avatar Oct 29 '24 23:10 Benjamin-Loison

COMMAND = "echo $'a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND))
echo $'a\nb'
['echo', '$a\\nb']
COMMAND = "echo '$a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND))
echo '$a\nb'
['echo', '$a\\nb']
echo '$a\nb'
$a\nb
COMMAND = "echo $'a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND, posix = False))
echo $'a\nb'
['echo', "$'a\\nb'"]
COMMAND = "echo '$a\\nb'"
print(COMMAND)
print(shlex.split(COMMAND, posix = False))
echo '$a\nb'
['echo', "'$a\\nb'"]

Benjamin-Loison avatar Oct 29 '24 23:10 Benjamin-Loison

In the case of Online authentication API, this issue may be due to my complex passwords, using, if possible, passwords without special characters may help. Otherwise use special character that do not seem to require some kind of escape as $'...' seems to provide.

Benjamin-Loison avatar Oct 30 '24 00:10 Benjamin-Loison

For reference: dollar.

Benjamin-Loison avatar Sep 14 '25 11:09 Benjamin-Loison