olefile API Reference

Indices and tables

Summary

olefile.isOleFile(filename) Test if a file is an OLE container (according to the magic bytes in its header).
olefile.OleFileIO([filename, raise_defects, ...]) OLE container object
olefile.OleMetadata() class to parse and store metadata from standard properties of OLE files.
olefile.enable_logging() Enable logging for this module (disabled by default).

olefile module

olefile (formerly OleFileIO_PL)

Module to read/write Microsoft OLE2 files (also called Structured Storage or Microsoft Compound Document File Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, ... This version is compatible with Python 2.6+ and 3.x

Project website: https://www.decalage.info/olefile

olefile is copyright (c) 2005-2017 Philippe Lagadec (https://www.decalage.info)

olefile is based on the OleFileIO module from the PIL library v1.1.7 See: http://www.pythonware.com/products/pil/index.htm and http://svn.effbot.org/public/tags/pil-1.1.7/PIL/OleFileIO.py

The Python Imaging Library (PIL) is Copyright (c) 1997-2009 by Secret Labs AB Copyright (c) 1995-2009 by Fredrik Lundh

See source code and LICENSE.txt for information on usage and redistribution.

olefile.isOleFile(filename)

Test if a file is an OLE container (according to the magic bytes in its header).

Note

This function only checks the first 8 bytes of the file, not the rest of the OLE structure.

New in version 0.16.

Parameters:filename (bytes or str or unicode or file) –

filename, contents or file-like object of the OLE file (string-like or file-like object)

  • if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)
  • if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)
  • if filename is a file-like object (with read and seek methods), it is parsed as-is.
Returns:True if OLE, False otherwise.
Return type:bool
class olefile.OleFileIO(filename=None, raise_defects=40, write_mode=False, debug=False, path_encoding='utf-8')

OLE container object

This class encapsulates the interface to an OLE 2 structured storage file. Use the listdir and openstream methods to access the contents of this file.

Object names are given as a list of strings, one for each subentry level. The root entry should be omitted. For example, the following code extracts all image streams from a Microsoft Image Composer file:

ole = OleFileIO("fan.mic")

for entry in ole.listdir():
    if entry[1:2] == "Image":
        fin = ole.openstream(entry)
        fout = open(entry[0:1], "wb")
        while True:
            s = fin.read(8192)
            if not s:
                break
            fout.write(s)

You can use the viewer application provided with the Python Imaging Library to view the resulting files (which happens to be standard TIFF files).

Constructor for the OleFileIO class.

Parameters:
  • filename

    file to open.

    • if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)
    • if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)
    • if filename is a file-like object (with read, seek and tell methods), it is parsed as-is.
  • raise_defects – minimal level for defects to be raised as exceptions. (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a security-oriented application, see source code for details)
  • write_mode – bool, if True the file is opened in read/write mode instead of read-only by default.
  • debug – bool, set debug mode (deprecated, not used anymore)
  • path_encoding – None or str, name of the codec to use for path names (streams and storages), or None for Unicode. Unicode by default on Python 3+, UTF-8 on Python 2.x. (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
close()

close the OLE file, to release the file object

dumpdirectory()

Dump directory (for debugging only)

dumpfat(fat, firstindex=0)

Display a part of FAT in human-readable form for debugging purposes

dumpsect(sector, firstindex=0)

Display a sector in a human-readable form, for debugging purposes

exists(filename)

Test if given filename exists as a stream or a storage in the OLE container. Note: filename is case-insensitive.

Parameters:filename – path of stream in storage tree. (see openstream for syntax)
Returns:True if object exist, else False.
get_metadata()

Parse standard properties streams, return an OleMetadata object containing all the available metadata. (also stored in the metadata attribute of the OleFileIO object)

new in version 0.25

get_rootentry_name()

Return root entry name. Should usually be ‘Root Entry’ or ‘R’ in most implementations.

get_size(filename)

Return size of a stream in the OLE container, in bytes.

Parameters:

filename – path of stream in storage tree (see openstream for syntax)

Returns:

size in bytes (long integer)

Raises:
  • IOError – if file not found
  • TypeError – if this is not a stream.
get_type(filename)

Test if given filename exists as a stream or a storage in the OLE container, and return its type.

Parameters:filename – path of stream in storage tree. (see openstream for syntax)
Returns:False if object does not exist, its entry type (>0) otherwise:
  • STGTY_STREAM: a stream
  • STGTY_STORAGE: a storage
  • STGTY_ROOT: the root entry
getclsid(filename)

Return clsid of a stream/storage.

Parameters:filename – path of stream/storage in storage tree. (see openstream for syntax)
Returns:Empty string if clsid is null, a printable representation of the clsid otherwise

new in version 0.44

getctime(filename)

Return creation time of a stream/storage.

Parameters:filename – path of stream/storage in storage tree. (see openstream for syntax)
Returns:None if creation time is null, a python datetime object otherwise (UTC timezone)

new in version 0.26

getmtime(filename)

Return modification time of a stream/storage.

Parameters:filename – path of stream/storage in storage tree. (see openstream for syntax)
Returns:None if modification time is null, a python datetime object otherwise (UTC timezone)

new in version 0.26

getproperties(filename, convert_time=False, no_conversion=None)

Return properties described in substream.

Parameters:
  • filename – path of stream in storage tree (see openstream for syntax)
  • convert_time – bool, if True timestamps will be converted to Python datetime
  • no_conversion – None or list of int, timestamps not to be converted (for example total editing time is not a real timestamp)
Returns:

a dictionary of values indexed by id (integer)

getsect(sect)

Read given sector from file on disk.

Parameters:sect – int, sector index
Returns:a string containing the sector data.
listdir(streams=True, storages=False)

Return a list of streams and/or storages stored in this file

Parameters:
  • streams – bool, include streams if True (True by default) - new in v0.26
  • storages – bool, include storages if True (False by default) - new in v0.26 (note: the root storage is never included)
Returns:

list of stream and/or storage paths

loaddirectory(sect)

Load the directory.

Parameters:sect – sector index of directory stream.
loadfat(header)

Load the FAT table.

loadfat_sect(sect)

Adds the indexes of the given sector to the FAT

Parameters:sect – string containing the first FAT sector, or array of long integers
Returns:index of last FAT sector.
loadminifat()

Load the MiniFAT table.

open(filename, write_mode=False)

Open an OLE2 file in read-only or read/write mode. Read and parse the header, FAT and directory.

Parameters:
  • filename

    string-like or file-like object, OLE file to parse

    • if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)
    • if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)
    • if filename is a file-like object (with read, seek and tell methods), it is parsed as-is.
  • write_mode – bool, if True the file is opened in read/write mode instead of read-only by default. (ignored if filename is not a path)
openstream(filename)

Open a stream as a read-only file object (BytesIO). Note: filename is case-insensitive.

Parameters:filename

path of stream in storage tree (except root entry), either:

  • a string using Unix path syntax, for example: ‘storage_1/storage_1.2/stream’
  • or a list of storage filenames, path to the desired stream/storage. Example: [‘storage_1’, ‘storage_1.2’, ‘stream’]
Returns:file object (read-only)
Raises:IOError – if filename not found, or if this is not a stream.
sect2array(sect)

convert a sector to an array of 32 bits unsigned integers, swapping bytes on big endian CPUs such as PowerPC (old Macs)

write_sect(sect, data, padding='\x00')

Write given sector to file on disk.

Parameters:
  • sect – int, sector index
  • data – bytes, sector data
  • padding – single byte, padding character if data < sector size
write_stream(stream_name, data)

Write a stream to disk. For now, it is only possible to replace an existing stream by data of the same size.

Parameters:
  • stream_name

    path of stream in storage tree (except root entry), either:

    • a string using Unix path syntax, for example: ‘storage_1/storage_1.2/stream’
    • or a list of storage filenames, path to the desired stream/storage. Example: [‘storage_1’, ‘storage_1.2’, ‘stream’]
  • data – bytes, data to be written, must be the same size as the original stream.
class olefile.OleMetadata

class to parse and store metadata from standard properties of OLE files.

Available attributes: codepage, title, subject, author, keywords, comments, template, last_saved_by, revision_number, total_edit_time, last_printed, create_time, last_saved_time, num_pages, num_words, num_chars, thumbnail, creating_application, security, codepage_doc, category, presentation_target, bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips, scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty, chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed, version, dig_sig, content_type, content_status, language, doc_version

Note: an attribute is set to None when not present in the properties of the OLE file.

References for SummaryInformation stream:

References for DocumentSummaryInformation stream:

new in version 0.25

Constructor for OleMetadata All attributes are set to None by default

DOCSUM_ATTRIBS = ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs', 'slides', 'notes', 'hidden_slides', 'mm_clips', 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager', 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc', 'link_base', 'hlinks', 'hlinks_changed', 'version', 'dig_sig', 'content_type', 'content_status', 'language', 'doc_version']
SUMMARY_ATTRIBS = ['codepage', 'title', 'subject', 'author', 'keywords', 'comments', 'template', 'last_saved_by', 'revision_number', 'total_edit_time', 'last_printed', 'create_time', 'last_saved_time', 'num_pages', 'num_words', 'num_chars', 'thumbnail', 'creating_application', 'security']
dump()

Dump all metadata, for debugging purposes.

parse_properties(olefile)

Parse standard properties of an OLE file, from the streams \x05SummaryInformation and \x05DocumentSummaryInformation, if present. Properties are converted to strings, integers or python datetime objects. If a property is not present, its value is set to None.

olefile.enable_logging()

Enable logging for this module (disabled by default). This will set the module-specific logger level to NOTSET, which means the main application controls the actual logging level.