Frequently Asked Questions¶
Can I extract all images from MS OLE2 documents with olefile?¶
Not directly: images are not always stored the same way, and it also depends on the format.
For example in Powerpoint presentations, you may find a stream named “Pictures” when running “olefile yourfile.ppt”. You may extract the stream by using the openstream() method on the OleFileIO object, but you will usually get a binary stream containing several picture files. You may also extract it manually using tools such as SSView (http://www.mitec.cz/ssv.html).
Then the only way I’ve found so far is to use file carving tools which are able to determine the beginning and the end of each picture in a binary file. These tools are not always easy to use but if you’re interested have a look at http://pypi.python.org/pypi/hachoir-subfile and http://www.forensicswiki.org/wiki/Tools:Data_Recovery#Carving.
If you really need to automate the process then you have to study Microsoft specifications (at http://www.microsoft.com/interop/docs/officebinaryformats.mspx) and find the right way to parse MS Office documents…
A lot of people (including me) would be very interested if you find a solution! ;-)