bfg-repo-cleaner icon indicating copy to clipboard operation
bfg-repo-cleaner copied to clipboard

Question: multiline regex allowed in --replace-text?

Open marscher opened this issue 9 years ago • 16 comments

background is I want to strip output of an IPython notebook, which uses json to store its data. So I would need to match the last bracket of an "outputs" dictionary.

marscher avatar Oct 15 '14 18:10 marscher

Could you give a truncated example- before and after?

rtyley avatar Oct 15 '14 18:10 rtyley

before:

 "outputs": [                                                               
  {                                                                         
   "output_type": "stream",                                                 
   "stream": "stdout",                                                      
   "text": [                                                                
    "Populating the interactive namespace from numpy and matplotlib\n"      
   ]                                                                        
  }                                                                         
 ],

after:

 "outputs": []

marscher avatar Oct 15 '14 18:10 marscher

maybe it works out of the box, but I have not tested yet: http://www.mkyong.com/regular-expressions/regular-expression-matches-multiple-line-example-java/

marscher avatar Oct 15 '14 18:10 marscher

I should think adding the (?s) prefix would work. Remember to prefix the entire expression with regex: too, I think the text is assumed to just be a text literal by the BFG otherwise.

rtyley avatar Oct 15 '14 21:10 rtyley

My regex matches, if I use Java, the same regex (prefixed with regex:) leads to "bfg aborting : no refs to update". Tried both with escaped backslashes and without.

 regex:(?s)(\\s+)\"outputs\":\\s+\\[[^\\]](.+)\\1\\] 

marscher avatar Oct 16 '14 17:10 marscher

works well at last. Thank you

marscher avatar Oct 16 '14 17:10 marscher

works well at last. Thank you

Great! What was the problem just before that (with https://github.com/rtyley/bfg-repo-cleaner/issues/58#issuecomment-59400581)? How did you fix it?

rtyley avatar Oct 16 '14 17:10 rtyley

you do not need to escape backslashes and I found one typo. Best.

marscher avatar Oct 16 '14 17:10 marscher

Cool - thanks.

rtyley avatar Oct 16 '14 17:10 rtyley

Unfortunately, I performed a wrong test, when closing this. Multi line regex does not seem to work: Testing example: https://github.com/marscher/bfg_multiline_regex

marscher avatar Oct 20 '14 18:10 marscher

Just stumbled over this as well.

We kept RSA private keys in a YAML file like this (not a real private key, obviously):

private_keys:
  foo_key1.pem:
    content: |
      -----BEGIN RSA PRIVATE KEY-----
      MIIEogIBAAKKAQEAzWXZ7ZdzGe5aez+vZKsaHI4e0hRF57BoewZTOKlmF2ijVqDK
      QveAW42R1KENm4t3/ikMV0wzMjA2WZX6wpb94brw1VeTiTs7y3I6/7OgMEVrmZ/T
      eKk2JGahHqdqA3+BEsjK9OjlYgjXGtho0qnKdt5kZjv3kA2R9dwZJzghiTrqrKKI
      BN2fatZtI+MzvAv5+i91AthSzaqmO31SbZZ/ZK0vb0ehlt6oZs1Z+KZW4yo206lZ
      1lK4B4nIZF3Rn2mmD6jRs/BAIMFm/AeFOzndsxqAyAxZKKHqK5l+ZDld6J3xiKtZ
      sHQK9ijXR1iTBA6Sd4HO3QOx/+BzbbsNQnei5QIDAQABAoIBAFGSGMg1rFZxBXgK
      ZABtd/KNxBm0dNM2bqQ1GWM/k+15iZ5miZkPElRRN9/sK1K4AVbPxeGpZf1XFp7J
      fol0OW159kJnmXvNVK30ZQKBgD7BMnJoD3GHPzmyyzov4yx/GK97bH6Sa7tIK1/V
      oBQoNGye+93VZ+2E6KN/oZOKRKH7rlgf6vtKtKM00fMLA2yb52s4G7pj9MDvY0k3
      Wdo1VWgj51rPgjb3X5h4wvmoo61IZBYtmw5/iT/DZVyMs7l1vOaGF6pATKQsZybv
      KbQRAoGARhpaOngoK0qG2rtl34z9TXvZT3XMbLmDaJ+jDjjtEazOt5jS7EP1ppSK
      rRSiqZ5Sv5NqzisN6OZHLdt1JNwarNZnNItGfs/PmpZb7SSfJqarGmNx25OK7JPe
      7geK4x9I71G9HE6aMtVK4S05KpBeA4zT7gEZ51yf4hDTf1KZSL1=
      -----END RSA PRIVATE KEY-----

Now we moved those keys to another backend and would like to clean up the repo so that the YAML looks something like this:

private_keys:
  foo_key1.pem:
    content: |
      -----BEGIN RSA PRIVATE KEY-----
      REMOVED
      -----END RSA PRIVATE KEY-----

We could just nuke the YAML files that contained private keys but I'd prefer keeping them and their history around.

A regex that should match the characters between the BEGIN and END markers (according to regex101.com) is this:

(?s)-----BEGIN RSA PRIVATE KEY-----(.+)-----END RSA PRIVATE KEY-----

Unfortunately running bfg with this replace.txt doesn't work:

regex:(?s)-----BEGIN RSA PRIVATE KEY-----(.+)-----END RSA PRIVATE KEY-----==>REMOVED
java -jar ~/Downloads/bfg-1.11.8.jar --replace-text replace.txt -fi private_keys.yaml repo.git

...

Cleaning
--------

Found 1294 commits
Cleaning commits:       100% (1294/1294)
Cleaning commits completed in 618 ms.

BFG aborting: No refs to update - no dirty commits found?

What to do?

antaflos avatar Oct 30 '14 23:10 antaflos

FWIW, this is still an issue in bfg-1.12.3.

antaflos avatar May 08 '15 15:05 antaflos

I have raised a PR https://github.com/rtyley/bfg-repo-cleaner/pull/168 with a fix for this.

It allows you to add support using an optional command line param (default behaviour is as is), because it will change the processing to load the entire blob instead of a line at a time (which could cause problems if you have giant commits..!)

franekrichardson avatar Aug 05 '16 15:08 franekrichardson

Would love this too in order to support jsonpath or xpath search&replacements: https://github.com/rtyley/bfg-repo-cleaner/issues/265

jessehouwing avatar Mar 13 '18 14:03 jessehouwing

This is still not working. Are you planning to follow it up ?

caglarsayin avatar Sep 05 '19 13:09 caglarsayin

I have raised a PR #168 with a fix for this.

It allows you to add support using an optional command line param (default behaviour is as is), because it will change the processing to load the entire blob instead of a line at a time (which could cause problems if you have giant commits..!)

It works! I compiled your code, and used this command.

java -jar bfg.jar --multi-line-regex --replace-text replace.txt

My replace.txt is

regex:(?s)if __name__ == '__main__':[.\s\S]*==>print("DELETED!!")

I successfully replace all code after

if __name__ == '__main__':

thx~

ESWZY avatar May 03 '20 13:05 ESWZY