nbstripout icon indicating copy to clipboard operation
nbstripout copied to clipboard

Consider stripping out only IMAGES as option

Open rsvp opened this issue 7 years ago • 13 comments

Frequently the output is useful as informal "testing" of results, and there is very little overhead in keeping non-graphical results. Images, however, add considerable unwanted bulk to a commit for any version control system (unless those images are very expensive to reproduce, or are historical for some reason).

Proposal: provide an option to strip out only output cells with images.

rsvp avatar Jun 01 '17 18:06 rsvp

Interesting, that sounds like a useful option. How do you see the "UI" of this option?

Suppose this could even be easily generalized to support any data type to either keep or strip out.

kynan avatar Jun 09 '17 21:06 kynan

As an example, how about along these lines?

$ nbstripout --only=images [notebook(s)]
#  where images is just an alias for *.png, *.jpg, etc.

$ nbstripout --only='tmp-*.png' [notebook(s)]
#  _wildcard support_, say, for inconsequential images labeled as such by user.

rsvp avatar Jun 10 '17 03:06 rsvp

I thought about this as I was implementing --keep-count and --keep-output and considered an options structure:

--keep-count, -c
--keep-output, -o
    --keep-text-output, -t
    --keep-image-output, -i
--keep-metadata, -m

Thus nbstripout -o would be equivalent to nbstripout -ti. Stripping only images would use nbstripout -ctm. For more fine-grained control at the cell level, keep_output-type metadata would suffice. That said, this doesn't scale well as maybe there are more output types we'd consider in the future. (Widgets?)

jpeacock29 avatar Jun 10 '17 05:06 jpeacock29

@jpeacock29 Re: more output types. Consider a flag like --regex, -r

where a user can implement their custom stripout by giving a regular expression. This is what I did with sed -- then understood that an edited notebook must leave in a trusted state. Going from regex to images would then be a one-liner.

rsvp avatar Jun 10 '17 15:06 rsvp

Sounds quite sensible! Let's start with images since this seems to be the most important case.

One of you happy to have a go at this @rsvp @jpeacock29 ?

kynan avatar Jun 10 '17 19:06 kynan

What would the regex flag be matched against? Every key in the ipynb?

jpeacock29 avatar Jun 11 '17 16:06 jpeacock29

@jpeacock29 here's a mock example for PNG images: $ nbstripout --regex='png": "i'

That should get rid of the super long lines encoding PNG images for now. I question the permanence of that regular expression since it is subject to the formatting whims upstream. Historically, both "image/png": and "png": have been used as keys, so my example would fortunately work for both cases.

But --regex would be a handy tool in any case.

rsvp avatar Jun 11 '17 19:06 rsvp

Feel free to have a go at this, I'm not sure when I might find time to work on it myself.

kynan avatar Jun 13 '17 10:06 kynan

One of you interested in working on this @rsvp @jpeacock29 ?

kynan avatar Jul 30 '17 11:07 kynan

Are you still interested in this @rsvp @jpeacock29 ?

kynan avatar Jul 09 '18 18:07 kynan

hi @kynan my spare cycles are going to refactoring https://git.io/fecon235 so realistically maybe later this year.

Interestingly, one of the reasons leading to the spin-off of the source code to another repository https://git.io/fecon236 was to leave behind all the archival bulky images preserved in the .git for notebooks.

So this issue is still pertinent.

rsvp avatar Jul 10 '18 15:07 rsvp

@rsvp do you still have this use case?

kynan avatar Jun 28 '20 21:06 kynan

There's an in flight pull request (#135) that's somewhat related: only strip outputs that are larger than a certain size. Would that fit the bill?

kynan avatar Apr 11 '21 23:04 kynan

Given #135 has been released in nbstripout 0.5.0 and is arguably even more flexible than what's requested here I'll close this as fixed.

kynan avatar Sep 24 '22 11:09 kynan