Knowledge base KB0239

PowerPoint file size is rapidly increasing with Microsoft 365

Problem

The most directly observable symptom is the file size of PowerPoint presentations being unusually large, potentially even many times larger than expected. In some cases, the issue may then lead to files being very slow to open, edit, and/or save—or potentially not being able to open them at all.

This issue may arise under the following conditions:

  • Files are (or were) saved in a SharePoint location
  • Files are (or were) encrypted via, e.g.
    • Azure Information Protection
    • Microsoft Information Protection
    • Microsoft Purview Information Protection
    • Access Restrictions
  • Slides or content on slides contain Tags or CustomerData objects (think-cell and many other add-ins use Tags objects as prescribed by Microsoft)

Copying affected slides or objects to other presentations can introduce the issue to additional files, even if they do not meet the above criteria.

Solution

Microsoft has released a fix for this issue, which at present is available in the latest publicly known updates for Microsoft 365. Older versions available under the Monthly Enterprise Channel and Semi-Annual Enterprise Channel may still be affected for the time being.

A table of affected and fixed versions can be found below:

Version

First affected build

Fix available in:

2208

15601.20578

**

2301

16026.20002

**

2302

*

16130.20714

2303

*

**

2304

*

**

2305

*

**

2306

*

16529.20182

2307

*

16626.20000

* - issue existed in initial release
** - No fix expected

Note: while Microsoft's fix prevents the issue from arising, it does not repair files that are already affected. There are however a few options available to clean affected files, which in combination with Microsoft's fix, should resolve the issue:

PowerPoint's built-in Document Inspector can be used to remove customXML data from affected files. If using this tool, please ensure that only the option Custom XML data is used. Removing additional content types may have unintended consequences, including removing think-cell functionality from the document.

If you are using think-cell 12, the 21.4 Clean up and sanitize tool offers a more convenient way to access this same functionality.

Removing customXML data in one of these two ways will not prevent it from accumulating again if you are using an unpatched version of PowerPoint, but can at least temporarily "reset" the issue.

In the event that you have files that are so severely damaged that they cannot reliably be opened with PowerPoint, and thus neither Document Inspector nor think-cell's Clean up tool can be used, or if the number of affected documents requires an automated solution, please contact our support team. We may be able to help.

Analysis

Our developers have analyzed the problem in detail. In cases that meet the conditions described above, opening a file that is stored on SharePoint and is encrypted may cause PowerPoint to call this function in its code:

PPT::FileIO::CustomerDataXmlReader::DeSerializeCustomerDataFromEncryptedStorage

This function was newly added starting in the listed versions of PowerPoint. Files stored on SharePoint contain certain customXML data files used for SharePoint's File Management. When the code function is called, all contained customXML files in the entire document are duplicated for every slide or object that contains a Tag or CustomerData. Continuing to work with affected files may lead to the continued accumulation of this junk data in the course of normal editing actions, as well.

This issue can be reproduced without think-cell. For detailed reproduction steps, please click on:

Reproduction without think-cell:

  1. With think-cell inactive or temporarily removed, open a new blank PowerPoint presentation with no placeholder textboxes
  2. Insert one PowerPoint rectangle, e.g. via Insert > Illustrations > Shapes
  3. Open the VBA window (press Alt+F11)
  4. In the Immediate Window (you can activate it by pressing Ctrl+G if not present) type in the following, and then press Enter:

    ActivePresentation.Slides(1).Shapes(1).Tags.Add "Test", "Tag"

  5. Close the VBA window
  6. In PowerPoint, select and duplicate the rectangle via Ctrl+D 30 times, so that 31 total rectangles exist (the reproduction will work with fewer shapes, though the file sizes will then vary accordingly)
  7. Save the presentation to a SharePoint location

    The document should be approximately 54 KB in size once fully synced to SharePoint.

  8. Now, go to the File tab of the PowerPoint ribbon, and select Info > Protect Presentation > Restrict Access > Restricted Access:
  9. In the Permission dialog that appears, check the box Restrict permission to this presentation and click OK:
  10. Save the document (e.g., by pressing Ctrl+S) and then close it completely
  11. Open the document again
  12. Remove the access restriction by again going to File > Info > Protect Presentation > Restrict Access, and this time select Unrestricted Access
  13. Save the document again, and then close it.

    The document should still be 54 KB, but has grown to approximately 195 KB in size.

  14. Reopen the document and repeat steps 8 through 13

    The document should still be 54 KB, but has grown to approximately 5 MB in size.

Examining the file structure (by e.g. copying the .pptx file and changing the file extension of the copy to .zip, and then extracting the archive) one would see that the number of item*.xml and itemProps*.xml files contained within the customXml subfolder will go from 3 of each file type after syncing to SharePoint, to 96 of each after the first repetition, and 2979 of each after the second repetition.

There are additional steps and work flows that may lead to duplication of customXML data as well, but this is the most straight-forward to demonstrate and has the most pronounced effect.

Why think-cell is affected

think-cell does not use customXML data, but it does use Tags in each shape or object created via think-cell. As such, slides created with think-cell may contain a particularly high number of tagged shapes, resulting in extremely fast duplication of customXML data due to this issue.

Share