feature: delete a slide
In order to modify an existing presentation to suit a new purpose As a developer using python-pptx I need the ability to delete a slide
API perhaps:
slides.remove(slide)
# OR
slides.remove(slide_a, slide_b, ...)
# alternately
slides.remove(*slides[2:4])
# OR
slide.delete()
It is very usefull! Expeting for this.
I need this functionality for a project I'm currently working on. Any suggestions on where to get started for implementing it on my own?
I believe if you remove the slide reference from the presentation element and remove the relationship connecting the slide to the presentation, the slide will get dropped on save. Basically you undo what the _Slides.add_slide() function does here: https://github.com/scanny/python-pptx/blob/master/pptx/parts/presentation.py#L121
Cool! I'll give that a go
@waveofbabies Did you have any luck with what @scanny proposed? Would you be willing to share what worked or didn't work? Thanks!
@scanny What you proposed to @waveofbabies sounds like it would be pretty straightforward, but I'm not sure where to get started. Would you be willing to write out some example code? Thanks!
I created the following in the _Slides() class. While looping though a presentation's slides, I passed in one at a time a defined range of slide numbers I wanted to remove:
def remove_slide(self, idx):
rId = self._sldIdLst[idx].rId
self._prs.drop_rel(rId)
This ended up removing the content of these slides, but left the slides themselves behind. So close!
In #68 @blaze33 described a technique they used for removing slides:
def delete_slide(self, presentation, index):
xml_slides = presentation.slides._sldIdLst # pylint: disable=W0212
slides = list(xml_slides)
xml_slides.remove(slides[index])
I tried passing in slide numbers I wanted to remove using a loop like the above, but I was only successful in removing one slide -- the first slide. The rest of the slides I want to remove are stubbornly still present. Any help would be much appreciated!
@jdgodchaux I think your question might be a good one for Stack Overflow.
thank you @jdgodchaux ! this successfully remove slides #22 to #118 from the pptx file :
for i in range(118,22,-1) :
rId = prs.slides._sldIdLst[i].rId
prs.part.drop_rel(rId)
del prs.slides._sldIdLst[i]
@krzys-andrei Great! Glad I could be of assistance!
Thank you for the info as I also needed to delete some slides. based on @krzys-andrei work I made this function that deletes a slide object from a prs object.
def delete_slide(prs, slide):
#Make dictionary with necessary information
id_dict = { slide.id: [i, slide.rId] for i,slide in enumerate(prs.slides._sldIdLst) }
slide_id = slide.slide_id
prs.part.drop_rel(id_dict[slide_id][1])
del prs.slides._sldIdLst[id_dict[slide_id][0]]
(I'm newish to coding) I can't seem to get this function to actually delete anything. I nested it inside a for loop like so: for n in range(5,1,-1): delete_slide(prs,prs.slides[n]) where prs is the presentation. It compiles without error but makes no changes to the prs. Can anyone help spot what I did wrong?
Its been a while since I used the function, which I maybe didn't test with multiple deletes. I would be worried that you are updating the prs.slides each time you do a delete_slide. Do prs.slides[n] then really correspond to the slide you want in loop number 3?
How about
for slide in prs.slides[1:5]:
delete_slide(prs,slide)
@EBjerrum Thanks for responding! I made a couple rookie mistakes (saved the doc before deleting slides) but the eventual code that worked was:
for i in range(0,6,1):
delete_slides(prs, 0)
def delete_slides(presentation, index):
xml_slides = presentation.slides._sldIdLst
slides = list(xml_slides)
xml_slides.remove(slides[index])
I was having trouble because every time I tried to delete the first slide (slide 0) it would reindex. I finally used that to my advantage deleting slide 0 six times. Not the most elegant solution, but it works
If the Slides object has python list semantics, isn't it easier to implement the delete as:
del prs.slides[:]
?
@will133 I looked into implementing removal of an object from a collection like that a while back. Can't remember the object off the top of my head, but it was like this, a member of a collection. I don't remember the details without repeating the research, but I came to the clear conclusion, as I recall, that using the del statement was going to be a bad idea for something like this.
The del statement is really designed for removing names from the namespace and it's surprisingly complex to implement it in a robust way, things like reference counts and so on making it a crap shoot on whether you ever receive a __del__() call and where.
But I do like using a slice for elegant specification; the method could use a variable-length argument list to allow specifying a single slide, a number of individual slides, or a list (sequence) of slides to be deleted. Something like:
def delete_slides(*slides):
...
so you can call it like:
delete_slides(slide_x)
or
delete_slides(slide_x, slide_y)
or
delete_slides(*prs.slides[1:3])
I deleted some slides using the code example suggested here. But now I get an error when opening the resulting presentation in PowerPoint 2016. I can click 'Repair' and then everything looks fine but of course I would prefere the presentation to open up without an error. The presentation works fine before the slides are deleted. Any idea what goes wrong and how to fix it?
def __delete_slide(self, index):
presentation = self.prs
xml_slides = presentation.slides._sldIdLst
slides = list(xml_slides)
xml_slides.remove(slides[index])
@maribet same for me, I can't delete slides right now
The python code works but it seems to somehow corrupt the pptx file. Is this an Office 2016 issue? Did it work with earlier Office versions?
I would expect you have some dangling relationship(s) still referring to the old slide. Doing this job cleanly in the general case involves coming to understand the full graph of objects the slide is embedded in and taking appropriate care to remove all the required links.
That complexity is one reason it hasn't been implemented in the API yet.
I solved my problem by hiding the slides. This way they are also not exportet to pdf, so the solution works for me. But a working DELETE for slides would be great!
A few notes on this for possible use later
The basic work of deleting a slide is to remove the relationship to that slide from the presentation part and remove its reference from the slide list (sldIdLst).
I don't believe there are any other "inbound" relationships referring to a slide that need to be dealt with. A Notes Slide is related to its slide, but the only way to get to it is from the slide itself (the relationship is "two-way".
All the other relationships I can think of can simply be ignored and they would disappear when the presentation is saved. A slide's relationship to its slide-layout is an example of that; when there is no relationship to the slide, the slide doesn't get written. When there's no slide, its relationships are not traversed and they are not written. The only thing that would be problematic is a relationship target that needed to be explicitly deleted lest it appear in the saved presentation somehow.
Images, charts, and hyperlinks are the three other common relationships. I'm inclined to think they would all be self-resolving. The best next step is probably to experiment a little and see what the behavior is when just deleting the slide from the presentation part and its relationships. It could be deleting a slide is substantially easier than deleting a shape (in the general case).
Thanks for the hard work! Please let it happen, it'd be extremely useful if it gets implemented.
@Krossfire9:
I was having trouble because every time I tried to delete the first slide (slide 0) it would reindex. I finally used that to my advantage deleting slide 0 six times. Not the most elegant solution, but it works
Old one, but you could simple iterate backwards so reindexing does not change any indices ;)
I did experiment a bit further and found that while deleting slides works using a couple of different proposals from this issue, they are not complete (enough). In the example below, any of the methods creates a corrupt .pptx from scratch. Note that I start from an empty presentation, create two slides, remove the first and add a third. For both methods 1 and 2, I get this error:
/usr/lib64/python3.6/zipfile.py:1355: UserWarning: Duplicate name: 'ppt/slides/slide2.xml'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/usr/lib64/python3.6/zipfile.py:1355: UserWarning: Duplicate name: 'ppt/slides/_rels/slide2.xml.rels'
return self._open_to_write(zinfo, force_zip64=force_zip64)
This is the code:
import pptx
def remove_slide(prs, idx):
# https://github.com/scanny/python-pptx/issues/67#issuecomment-165708190
rId = prs.slides._sldIdLst[idx].rId
prs.part.drop_rel(rId)
def delete_slide(presentation, index):
# https://github.com/scanny/python-pptx/issues/67#issuecomment-165708190
# https://github.com/scanny/python-pptx/issues/67#issuecomment-320792864
# https://github.com/scanny/python-pptx/issues/67#issuecomment-382660749
xml_slides = presentation.slides._sldIdLst # pylint: disable=W0212
slides = list(xml_slides)
xml_slides.remove(slides[index])
def delete_slide_2(prs, slide):
# https://github.com/scanny/python-pptx/issues/67#issuecomment-296135015
id_dict = {slide.id: [i, slide.rId] for i, slide in enumerate(prs.slides._sldIdLst)}
slide_id = slide.slide_id
prs.part.drop_rel(id_dict[slide_id][1])
del prs.slides._sldIdLst[id_dict[slide_id][0]]
method = 2
prs = pptx.Presentation()
slide0 = prs.slides.add_slide(prs.slide_layouts[0])
slide1 = prs.slides.add_slide(prs.slide_layouts[1])
if method == 0:
remove_slide(prs, 0)
elif method == 1:
delete_slide(prs, 0)
elif method == 2:
delete_slide_2(prs, slide0)
slide2 = prs.slides.add_slide(prs.slide_layouts[2])
prs.save('bug.pptx')
Thank you for the info as I also needed to delete some slides. based on @krzys-andrei work I made this function that deletes a slide object from a prs object.
def delete_slide(prs, slide): #Make dictionary with necessary information id_dict = { slide.id: [i, slide.rId] for i,slide in enumerate(prs.slides._sldIdLst) } slide_id = slide.slide_id prs.part.drop_rel(id_dict[slide_id][1]) del prs.slides._sldIdLst[id_dict[slide_id][0]]
this is working without giving repair error than the other one
thank you very much....
@bersbersbers is right, the method from @krzys-andrei has still a problem when adding a slide after deleting. In his example using delete_slide_2, the new created slide has
part.partname=='/ppt/slides/slide2.xml'
which seems correct, as it is the second slide. But since the deleting did not change the partname of the remaining slides, this partname is already used by the now first slide in prs. So I guess a deleting method has to somehow change the partnames as well (or move all remaining slides following the deleted slide up, if this feature becomes available).
or maybe the Problem is with add_slide:
@property
def _next_slide_partname(self):
"""
Return |PackURI| instance containing the partname for a slide to be
appended to this slide collection, e.g. ``/ppt/slides/slide9.xml``
for a slide collection containing 8 slides.
"""
sldIdLst = self._element.get_or_add_sldIdLst()
partname_str = "/ppt/slides/slide%d.xml" % (len(sldIdLst) + 1)
return PackURI(partname_str)
Does PowerPoint expect the partname (partname_str) to be similar to the slide index?
The slide parts in a presentation can be renamed using PresentationPart.rename_slide_parts() here: https://github.com/scanny/python-pptx/blob/master/pptx/parts/presentation.py#L99
This is called the first time the prs.slides attribute is accessed for a given presentation, such that whatever they were named on disk, they now have consecutive and normalized names. https://github.com/scanny/python-pptx/blob/master/pptx/presentation.py#L111
You can call this whenever you want with:
prs.part.rename_slide_parts()
You could also just save and reload the deck after one or more slide-delete operations before trying to add a new one. No need to save to disk of course, just saving to BytesIO and then loading back in from that should get it done (and would be pretty quick).
Note that part names are arbitrary. They can't be allowed to collide, but other than that, neither the naming "template" or the "directory" location is prescribed by the PowerPoint spec. The reason for naming them consistent with their order in the deck is just to make the Zip structure more readable by humans.
I think @natter1 has correctly identified the proximate problem, that the PresentationPart._next_slide_partname() method is naive about finding partnames and does not assure uniqueness. I think the "production" fix would just be to call .rename_slide_parts() after each delete as that avoids any vagaries of what they might have been named before. Then the naive approach continues to work just fine and is efficient.
Is this still in development then? That's too bad this feature would be awesome to have.