olefile API Reference¶
Indices and tables¶
Summary¶
olefile.isOleFile (filename) |
Test if a file is an OLE container (according to the magic bytes in its header). |
olefile.OleFileIO ([filename, raise_defects, …]) |
OLE container object |
olefile.OleMetadata () |
Class to parse and store metadata from standard properties of OLE files. |
olefile.enable_logging () |
Enable logging for this module (disabled by default). |
olefile module¶
olefile (formerly OleFileIO_PL)
Module to read/write Microsoft OLE2 files (also called Structured Storage or Microsoft Compound Document File Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, … This version is compatible with Python 2.7 and 3.5+
Project website: https://www.decalage.info/olefile
olefile is copyright (c) 2005-2020 Philippe Lagadec (https://www.decalage.info)
olefile is based on the OleFileIO module from the PIL library v1.1.7 See: http://www.pythonware.com/products/pil/index.htm and http://svn.effbot.org/public/tags/pil-1.1.7/PIL/OleFileIO.py
The Python Imaging Library (PIL) is Copyright (c) 1997-2009 by Secret Labs AB Copyright (c) 1995-2009 by Fredrik Lundh
See source code and LICENSE.txt for information on usage and redistribution.
-
olefile.
isOleFile
(filename)¶ Test if a file is an OLE container (according to the magic bytes in its header).
Note
This function only checks the first 8 bytes of the file, not the rest of the OLE structure.
New in version 0.16.
Parameters: filename (bytes or str or unicode or file) – filename, contents or file-like object of the OLE file (string-like or file-like object)
- if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)
- if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)
- if filename is a file-like object (with read and seek methods), it is parsed as-is.
Returns: True if OLE, False otherwise. Return type: bool
-
class
olefile.
OleFileIO
(filename=None, raise_defects=40, write_mode=False, debug=False, path_encoding='utf-8')¶ OLE container object
This class encapsulates the interface to an OLE 2 structured storage file. Use the listdir and openstream methods to access the contents of this file.
Object names are given as a list of strings, one for each subentry level. The root entry should be omitted. For example, the following code extracts all image streams from a Microsoft Image Composer file:
with OleFileIO("fan.mic") as ole: for entry in ole.listdir(): if entry[1:2] == "Image": fin = ole.openstream(entry) fout = open(entry[0:1], "wb") while True: s = fin.read(8192) if not s: break fout.write(s)
You can use the viewer application provided with the Python Imaging Library to view the resulting files (which happens to be standard TIFF files).
Constructor for the OleFileIO class.
Parameters: - filename –
file to open.
- if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)
- if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)
- if filename is a file-like object (with read, seek and tell methods), it is parsed as-is. The caller is responsible for closing it when done.
- raise_defects – minimal level for defects to be raised as exceptions. (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a security-oriented application, see source code for details)
- write_mode – bool, if True the file is opened in read/write mode instead of read-only by default.
- debug – bool, set debug mode (deprecated, not used anymore)
- path_encoding – None or str, name of the codec to use for path names (streams and storages), or None for Unicode. Unicode by default on Python 3+, UTF-8 on Python 2.x. (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
-
close
()¶ close the OLE file, release the file object if we created it ourselves.
Leaves the file handle open if it was provided by the caller.
-
dumpdirectory
()¶ Dump directory (for debugging only)
-
dumpfat
(fat, firstindex=0)¶ Display a part of FAT in human-readable form for debugging purposes
-
dumpsect
(sector, firstindex=0)¶ Display a sector in a human-readable form, for debugging purposes
-
exists
(filename)¶ Test if given filename exists as a stream or a storage in the OLE container. Note: filename is case-insensitive.
Parameters: filename – path of stream in storage tree. (see openstream for syntax) Returns: True if object exist, else False.
-
get_document_variables
()¶ Extract the document variables from Microsft Word docs :return: it returns a list of dictionaries, each of them contains var_name and value keys
-
get_metadata
()¶ Parse standard properties streams, return an OleMetadata object containing all the available metadata. (also stored in the metadata attribute of the OleFileIO object)
new in version 0.25
-
get_rootentry_name
()¶ Return root entry name. Should usually be ‘Root Entry’ or ‘R’ in most implementations.
-
get_size
(filename)¶ Return size of a stream in the OLE container, in bytes.
Parameters: filename – path of stream in storage tree (see openstream for syntax)
Returns: size in bytes (long integer)
Raises: - IOError – if file not found
- TypeError – if this is not a stream.
-
get_type
(filename)¶ Test if given filename exists as a stream or a storage in the OLE container, and return its type.
Parameters: filename – path of stream in storage tree. (see openstream for syntax) Returns: False if object does not exist, its entry type (>0) otherwise: - STGTY_STREAM: a stream
- STGTY_STORAGE: a storage
- STGTY_ROOT: the root entry
-
get_userdefined_properties
(filename, convert_time=False, no_conversion=None)¶ Return properties described in substream.
Parameters: - filename – path of stream in storage tree (see openstream for syntax)
- convert_time – bool, if True timestamps will be converted to Python datetime
- no_conversion – None or list of int, timestamps not to be converted (for example total editing time is not a real timestamp)
Returns: a dictionary of values indexed by id (integer)
-
getclsid
(filename)¶ Return clsid of a stream/storage.
Parameters: filename – path of stream/storage in storage tree. (see openstream for syntax) Returns: Empty string if clsid is null, a printable representation of the clsid otherwise new in version 0.44
-
getctime
(filename)¶ Return creation time of a stream/storage.
Parameters: filename – path of stream/storage in storage tree. (see openstream for syntax) Returns: None if creation time is null, a python datetime object otherwise (UTC timezone) new in version 0.26
-
getmtime
(filename)¶ Return modification time of a stream/storage.
Parameters: filename – path of stream/storage in storage tree. (see openstream for syntax) Returns: None if modification time is null, a python datetime object otherwise (UTC timezone) new in version 0.26
-
getproperties
(filename, convert_time=False, no_conversion=None)¶ Return properties described in substream.
Parameters: - filename – path of stream in storage tree (see openstream for syntax)
- convert_time – bool, if True timestamps will be converted to Python datetime
- no_conversion – None or list of int, timestamps not to be converted (for example total editing time is not a real timestamp)
Returns: a dictionary of values indexed by id (integer)
-
getsect
(sect)¶ Read given sector from file on disk.
Parameters: sect – int, sector index Returns: a string containing the sector data.
-
listdir
(streams=True, storages=False)¶ Return a list of streams and/or storages stored in this file
Parameters: - streams – bool, include streams if True (True by default) - new in v0.26
- storages – bool, include storages if True (False by default) - new in v0.26 (note: the root storage is never included)
Returns: list of stream and/or storage paths
-
loaddirectory
(sect)¶ Load the directory.
Parameters: sect – sector index of directory stream.
-
loadfat
(header)¶ Load the FAT table.
-
loadfat_sect
(sect)¶ Adds the indexes of the given sector to the FAT
Parameters: sect – string containing the first FAT sector, or array of long integers Returns: index of last FAT sector.
-
loadminifat
()¶ Load the MiniFAT table.
-
open
(filename, write_mode=False)¶ Open an OLE2 file in read-only or read/write mode. Read and parse the header, FAT and directory.
Parameters: - filename –
string-like or file-like object, OLE file to parse
- if filename is a string smaller than 1536 bytes, it is the path of the file to open. (bytes or unicode string)
- if filename is a string longer than 1535 bytes, it is parsed as the content of an OLE file in memory. (bytes type only)
- if filename is a file-like object (with read, seek and tell methods), it is parsed as-is. The caller is responsible for closing it when done
- write_mode – bool, if True the file is opened in read/write mode instead of read-only by default. (ignored if filename is not a path)
- filename –
-
openstream
(filename)¶ Open a stream as a read-only file object (BytesIO). Note: filename is case-insensitive.
Parameters: filename – path of stream in storage tree (except root entry), either:
- a string using Unix path syntax, for example: ‘storage_1/storage_1.2/stream’
- or a list of storage filenames, path to the desired stream/storage. Example: [‘storage_1’, ‘storage_1.2’, ‘stream’]
Returns: file object (read-only) Raises: IOError – if filename not found, or if this is not a stream.
-
parsing_issues
= None¶ list of defects/issues not raised as exceptions: tuples of (exception type, message)
-
sect2array
(sect)¶ convert a sector to an array of 32 bits unsigned integers, swapping bytes on big endian CPUs such as PowerPC (old Macs)
-
write_sect
(sect, data, padding='\x00')¶ Write given sector to file on disk.
Parameters: - sect – int, sector index
- data – bytes, sector data
- padding – single byte, padding character if data < sector size
-
write_stream
(stream_name, data)¶ Write a stream to disk. For now, it is only possible to replace an existing stream by data of the same size.
Parameters: - stream_name –
path of stream in storage tree (except root entry), either:
- a string using Unix path syntax, for example: ‘storage_1/storage_1.2/stream’
- or a list of storage filenames, path to the desired stream/storage. Example: [‘storage_1’, ‘storage_1.2’, ‘stream’]
- data – bytes, data to be written, must be the same size as the original stream.
- stream_name –
- filename –
-
class
olefile.
OleMetadata
¶ Class to parse and store metadata from standard properties of OLE files.
Available attributes: codepage, title, subject, author, keywords, comments, template, last_saved_by, revision_number, total_edit_time, last_printed, create_time, last_saved_time, num_pages, num_words, num_chars, thumbnail, creating_application, security, codepage_doc, category, presentation_target, bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips, scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty, chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed, version, dig_sig, content_type, content_status, language, doc_version
Note: an attribute is set to None when not present in the properties of the OLE file.
References for SummaryInformation stream:
- https://msdn.microsoft.com/en-us/library/dd942545.aspx
- https://msdn.microsoft.com/en-us/library/dd925819%28v=office.12%29.aspx
- https://msdn.microsoft.com/en-us/library/windows/desktop/aa380376%28v=vs.85%29.aspx
- https://msdn.microsoft.com/en-us/library/aa372045.aspx
- http://sedna-soft.de/articles/summary-information-stream/
- https://poi.apache.org/apidocs/org/apache/poi/hpsf/SummaryInformation.html
References for DocumentSummaryInformation stream:
- https://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
- https://msdn.microsoft.com/en-us/library/windows/desktop/aa380374%28v=vs.85%29.aspx
- https://poi.apache.org/apidocs/org/apache/poi/hpsf/DocumentSummaryInformation.html
New in version 0.25
Constructor for OleMetadata All attributes are set to None by default
-
DOCSUM_ATTRIBS
= ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs', 'slides', 'notes', 'hidden_slides', 'mm_clips', 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager', 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc', 'link_base', 'hlinks', 'hlinks_changed', 'version', 'dig_sig', 'content_type', 'content_status', 'language', 'doc_version']¶
-
SUMMARY_ATTRIBS
= ['codepage', 'title', 'subject', 'author', 'keywords', 'comments', 'template', 'last_saved_by', 'revision_number', 'total_edit_time', 'last_printed', 'create_time', 'last_saved_time', 'num_pages', 'num_words', 'num_chars', 'thumbnail', 'creating_application', 'security']¶
-
dump
()¶ Dump all metadata, for debugging purposes.
-
parse_properties
(ole_file)¶ Parse standard properties of an OLE file, from the streams
\x05SummaryInformation
and\x05DocumentSummaryInformation
, if present. Properties are converted to strings, integers or python datetime objects. If a property is not present, its value is set to None.Parameters: ole_file – OleFileIO object from which to parse properties
-
olefile.
enable_logging
()¶ Enable logging for this module (disabled by default). This will set the module-specific logger level to NOTSET, which means the main application controls the actual logging level.
-
olefile.
MAGIC
= '\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'¶ magic bytes that should be at the beginning of every OLE file:
-
olefile.
STGTY_EMPTY
= 0¶ empty directory entry
-
olefile.
STGTY_STREAM
= 2¶ element is a stream object
-
olefile.
STGTY_STORAGE
= 1¶ element is a storage object
-
olefile.
STGTY_ROOT
= 5¶ element is a root storage
-
olefile.
STGTY_PROPERTY
= 4¶ element is an IPropertyStorage object
-
olefile.
STGTY_LOCKBYTES
= 3¶ element is an ILockBytes object
-
olefile.
MAXREGSECT
= 4294967290¶ (-6) maximum SECT
-
olefile.
DIFSECT
= 4294967292¶ (-4) denotes a DIFAT sector in a FAT
-
olefile.
FATSECT
= 4294967293¶ (-3) denotes a FAT sector in a FAT
-
olefile.
ENDOFCHAIN
= 4294967294¶ (-2) end of a virtual stream chain
-
olefile.
FREESECT
= 4294967295¶ (-1) unallocated sector
-
olefile.
MAXREGSID
= 4294967290¶ (-6) maximum directory entry ID
-
olefile.
NOSTREAM
= 4294967295¶ (-1) unallocated directory entry
-
exception
olefile.
OleFileIONotClosed
(stack_of_open=None)¶ Bases:
exceptions.RuntimeWarning
Warning type used when OleFileIO is destructed but has open file handle.