olefile API Reference

Indices and tables

Summary

olefile.isOleFile([filename, data])

Test if a file is an OLE container (according to the magic bytes in its header).

olefile.OleFileIO([filename, raise_defects, ...])

OLE container object

olefile.OleMetadata()

Class to parse and store metadata from standard properties of OLE files.

olefile.enable_logging()

Enable logging for this module (disabled by default).

olefile module

olefile (formerly OleFileIO_PL)

Module to read/write Microsoft OLE2 files (also called Structured Storage or Microsoft Compound Document File Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, … This version is compatible with Python 2.7 and 3.5+

Project website: https://www.decalage.info/olefile

olefile is copyright (c) 2005-2023 Philippe Lagadec (https://www.decalage.info)

olefile is based on the OleFileIO module from the PIL library v1.1.7 See: http://www.pythonware.com/products/pil/index.htm and http://svn.effbot.org/public/tags/pil-1.1.7/PIL/OleFileIO.py

The Python Imaging Library (PIL) is Copyright (c) 1997-2009 by Secret Labs AB Copyright (c) 1995-2009 by Fredrik Lundh

See source code and LICENSE.txt for information on usage and redistribution.

olefile.DIFSECT = 4294967292

(-4) denotes a DIFAT sector in a FAT

olefile.ENDOFCHAIN = 4294967294

(-2) end of a virtual stream chain

olefile.FATSECT = 4294967293

(-3) denotes a FAT sector in a FAT

olefile.FREESECT = 4294967295

(-1) unallocated sector

olefile.MAGIC = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'

magic bytes that should be at the beginning of every OLE file:

olefile.MAXREGSECT = 4294967290

(-6) maximum SECT

olefile.MAXREGSID = 4294967290

(-6) maximum directory entry ID

olefile.NOSTREAM = 4294967295

(-1) unallocated directory entry

class olefile.OleFileIO(filename=None, raise_defects=40, write_mode=False, debug=False, path_encoding=None)

Bases: object

OLE container object

This class encapsulates the interface to an OLE 2 structured storage file. Use the listdir and openstream methods to access the contents of this file.

Object names are given as a list of strings, one for each subentry level. The root entry should be omitted. For example, the following code extracts all image streams from a Microsoft Image Composer file:

with OleFileIO("fan.mic") as ole:

    for entry in ole.listdir():
        if entry[1:2] == "Image":
            fin = ole.openstream(entry)
            fout = open(entry[0:1], "wb")
            while True:
                s = fin.read(8192)
                if not s:
                    break
                fout.write(s)

You can use the viewer application provided with the Python Imaging Library to view the resulting files (which happens to be standard TIFF files).

Constructor for the OleFileIO class.

Parameters:
  • filename

    file to open.

    • if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)

    • if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)

    • if filename is a file-like object (with read, seek and tell methods), it is parsed as-is. The caller is responsible for closing it when done.

  • raise_defects – minimal level for defects to be raised as exceptions. (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a security-oriented application, see source code for details)

  • write_mode – bool, if True the file is opened in read/write mode instead of read-only by default.

  • debug – bool, set debug mode (deprecated, not used anymore)

  • path_encoding – None or str, name of the codec to use for path names (streams and storages), or None for Unicode. Unicode by default on Python 3+, UTF-8 on Python 2.x. (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)

close()

close the OLE file, release the file object if we created it ourselves.

Leaves the file handle open if it was provided by the caller.

dumpdirectory()

Dump directory (for debugging only)

dumpfat(fat, firstindex=0)

Display a part of FAT in human-readable form for debugging purposes

dumpsect(sector, firstindex=0)

Display a sector in a human-readable form, for debugging purposes

exists(filename)

Test if given filename exists as a stream or a storage in the OLE container. Note: filename is case-insensitive.

Parameters:

filename – path of stream in storage tree. (see openstream for syntax)

Returns:

True if object exist, else False.

get_metadata()

Parse standard properties streams, return an OleMetadata object containing all the available metadata. (also stored in the metadata attribute of the OleFileIO object)

new in version 0.25

get_rootentry_name()

Return root entry name. Should usually be ‘Root Entry’ or ‘R’ in most implementations.

get_size(filename)

Return size of a stream in the OLE container, in bytes.

Parameters:

filename – path of stream in storage tree (see openstream for syntax)

Returns:

size in bytes (long integer)

Raises:
  • IOError – if file not found

  • TypeError – if this is not a stream.

get_type(filename)

Test if given filename exists as a stream or a storage in the OLE container, and return its type.

Parameters:

filename – path of stream in storage tree. (see openstream for syntax)

Returns:

False if object does not exist, its entry type (>0) otherwise:

  • STGTY_STREAM: a stream

  • STGTY_STORAGE: a storage

  • STGTY_ROOT: the root entry

get_userdefined_properties(filename, convert_time=False, no_conversion=None)

Return properties described in substream.

Parameters:
  • filename – path of stream in storage tree (see openstream for syntax)

  • convert_time – bool, if True timestamps will be converted to Python datetime

  • no_conversion – None or list of int, timestamps not to be converted (for example total editing time is not a real timestamp)

Returns:

a dictionary of values indexed by id (integer)

getclsid(filename)

Return clsid of a stream/storage.

Parameters:

filename – path of stream/storage in storage tree. (see openstream for syntax)

Returns:

Empty string if clsid is null, a printable representation of the clsid otherwise

new in version 0.44

getctime(filename)

Return creation time of a stream/storage.

Parameters:

filename – path of stream/storage in storage tree. (see openstream for syntax)

Returns:

None if creation time is null, a python datetime object otherwise (UTC timezone)

new in version 0.26

getmtime(filename)

Return modification time of a stream/storage.

Parameters:

filename – path of stream/storage in storage tree. (see openstream for syntax)

Returns:

None if modification time is null, a python datetime object otherwise (UTC timezone)

new in version 0.26

getproperties(filename, convert_time=False, no_conversion=None)

Return properties described in substream.

Parameters:
  • filename – path of stream in storage tree (see openstream for syntax)

  • convert_time – bool, if True timestamps will be converted to Python datetime

  • no_conversion – None or list of int, timestamps not to be converted (for example total editing time is not a real timestamp)

Returns:

a dictionary of values indexed by id (integer)

getsect(sect)

Read given sector from file on disk.

Parameters:

sect – int, sector index

Returns:

a string containing the sector data.

listdir(streams=True, storages=False)

Return a list of streams and/or storages stored in this file

Parameters:
  • streams – bool, include streams if True (True by default) - new in v0.26

  • storages – bool, include storages if True (False by default) - new in v0.26 (note: the root storage is never included)

Returns:

list of stream and/or storage paths

loaddirectory(sect)

Load the directory.

Parameters:

sect – sector index of directory stream.

loadfat(header)

Load the FAT table.

loadfat_sect(sect)

Adds the indexes of the given sector to the FAT

Parameters:

sect – string containing the first FAT sector, or array of long integers

Returns:

index of last FAT sector.

loadminifat()

Load the MiniFAT table.

open(filename, write_mode=False)

Open an OLE2 file in read-only or read/write mode. Read and parse the header, FAT and directory.

Parameters:
  • filename

    string-like or file-like object, OLE file to parse

    • if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)

    • if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)

    • if filename is a file-like object (with read, seek and tell methods), it is parsed as-is. The caller is responsible for closing it when done

  • write_mode – bool, if True the file is opened in read/write mode instead of read-only by default. (ignored if filename is not a path)

openstream(filename)

Open a stream as a read-only file object (BytesIO). Note: filename is case-insensitive.

Parameters:

filename

path of stream in storage tree (except root entry), either:

  • a string using Unix path syntax, for example: ‘storage_1/storage_1.2/stream’

  • or a list of storage filenames, path to the desired stream/storage. Example: [‘storage_1’, ‘storage_1.2’, ‘stream’]

Returns:

file object (read-only)

Raises:

IOError – if filename not found, or if this is not a stream.

parsing_issues

list of defects/issues not raised as exceptions: tuples of (exception type, message)

sect2array(sect)

convert a sector to an array of 32 bits unsigned integers, swapping bytes on big endian CPUs such as PowerPC (old Macs)

write_sect(sect, data, padding=b'\x00')

Write given sector to file on disk.

Parameters:
  • sect – int, sector index

  • data – bytes, sector data

  • padding – single byte, padding character if data < sector size

write_stream(stream_name, data)

Write a stream to disk. For now, it is only possible to replace an existing stream by data of the same size.

Parameters:
  • stream_name

    path of stream in storage tree (except root entry), either:

    • a string using Unix path syntax, for example: ‘storage_1/storage_1.2/stream’

    • or a list of storage filenames, path to the desired stream/storage. Example: [‘storage_1’, ‘storage_1.2’, ‘stream’]

  • data – bytes, data to be written, must be the same size as the original stream.

exception olefile.OleFileIONotClosed(stack_of_open=None)

Bases: RuntimeWarning

Warning type used when OleFileIO is destructed but has open file handle.

class olefile.OleMetadata

Bases: object

Class to parse and store metadata from standard properties of OLE files.

Available attributes: codepage, title, subject, author, keywords, comments, template, last_saved_by, revision_number, total_edit_time, last_printed, create_time, last_saved_time, num_pages, num_words, num_chars, thumbnail, creating_application, security, codepage_doc, category, presentation_target, bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips, scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty, chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed, version, dig_sig, content_type, content_status, language, doc_version

Note: an attribute is set to None when not present in the properties of the OLE file.

References for SummaryInformation stream:

References for DocumentSummaryInformation stream:

New in version 0.25

Constructor for OleMetadata All attributes are set to None by default

DOCSUM_ATTRIBS = ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs', 'slides', 'notes', 'hidden_slides', 'mm_clips', 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager', 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc', 'link_base', 'hlinks', 'hlinks_changed', 'version', 'dig_sig', 'content_type', 'content_status', 'language', 'doc_version']
SUMMARY_ATTRIBS = ['codepage', 'title', 'subject', 'author', 'keywords', 'comments', 'template', 'last_saved_by', 'revision_number', 'total_edit_time', 'last_printed', 'create_time', 'last_saved_time', 'num_pages', 'num_words', 'num_chars', 'thumbnail', 'creating_application', 'security']
dump()

Dump all metadata, for debugging purposes.

parse_properties(ole_file)

Parse standard properties of an OLE file, from the streams \x05SummaryInformation and \x05DocumentSummaryInformation, if present. Properties are converted to strings, integers or python datetime objects. If a property is not present, its value is set to None.

Parameters:

ole_file – OleFileIO object from which to parse properties

olefile.STGTY_EMPTY = 0

empty directory entry

olefile.STGTY_LOCKBYTES = 3

element is an ILockBytes object

olefile.STGTY_PROPERTY = 4

element is an IPropertyStorage object

olefile.STGTY_ROOT = 5

element is a root storage

olefile.STGTY_STORAGE = 1

element is a storage object

olefile.STGTY_STREAM = 2

element is a stream object

olefile.enable_logging()

Enable logging for this module (disabled by default). This will set the module-specific logger level to NOTSET, which means the main application controls the actual logging level.

olefile.isOleFile(filename=None, data=None)

Test if a file is an OLE container (according to the magic bytes in its header).

Note

This function only checks the first 8 bytes of the file, not the rest of the OLE structure. If data is provided, it also checks if the file size is above the minimal size of an OLE file (1536 bytes). If filename is provided with the path of the file on disk, the file is open only to read the first 8 bytes, then closed.

New in version 0.16.

Parameters:
  • filename (bytes, str, unicode or file-like object) –

    filename, contents or file-like object of the OLE file (string-like or file-like object)

    • if data is provided, filename is ignored.

    • if filename is a unicode string, it is used as path of the file to open on disk.

    • if filename is a bytes string smaller than 1536 bytes, it is used as path of the file to open on disk.

    • [deprecated] if filename is a bytes string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only) Note that this use case is deprecated and should be replaced by the new data parameter

    • if filename is a file-like object (with read and seek methods), it is parsed as-is.

  • data (bytes) – bytes string with the contents of the file to be checked, when the file is in memory (added in olefile 0.47)

Returns:

True if OLE, False otherwise.

Return type:

bool