Analysing 3D image file structures in Python 3

The Python programming language allows us to do some pretty exciting things with our 3D images, as hopefully the Photos3D library illustrates. It even allows us to delve into the low-level bytes of the files to work out ways to access the data they hold. But often using Python means installing libraries that are based on other programming languages and even have external program dependencies we have to install seperately. That’s OK in principle, but not all platforms have those languages or programs available. So I decided to write some pure Python code for analysing JPEG-based images, such as JPG, JPEG, JPS and MPO files, which you can download from the Photos3D Github repo.

So you can understand how the Python code works it’s important to understand how JPEG files are structured. Basically they have two-byte markers that split the file into a number of sections. I’ve put some of those markers below, taken from Wikipedia, but you can find more on the jpegdump.c webpage. You’ll notice they’re all two bytes long and start with 0xFF in hexadecimal. In fact, the JPEG standard ensures that the actual image doesn’t have that value byte in the data, and other markers tell us how long each section is, so we can use that information to work through the file to find all of the sections.

SOI0xFF, 0xD8Start Of Image
DHT0xFF, 0xC4Define Huffman Table(s)
DQT0xFF, 0xDBDefine Quantization Table(s)
SOS0xFF, 0xDAStart Of Scan
RSTn0xFF, 0xDn (n=0..7)Restart (in image data only)
APPn0xFF, 0xEnApplication-specific
COM0xFF, 0xFEComment
EOI0xFF, 0xD9End Of Image
Source: Wikipedia JPG file page.

So, armed with the Photos3D library from Github, we can try analysing some image files by running ‘python‘ in our terminal, which uses the jpegtool library module. You can edit to change the image that’s being analysed, and below I’ve used it with one of the sample JPG files in the testimages folder. As expected it starts and ends with SOI and EOI markers. It also has an APP0 section which is where the EXIF data is stored (as tells us at the bottom of its’ output). Markers 2 through 8 are image-related data we can ignore for now, and the SOS marker shows where the actual image bytes are located.

File: lefttest.jpg
Bytes found: 57386

Marker  Tag    Position   Length
------  ----   --------   ------
0       SOI    0          2
1       APP0   2          18
2       DQT    20         69
3       DQT    89         69
4       SOF0   158        19
5       DHT    177        33
6       DHT    210        183
7       DHT    393        33
8       DHT    426        183
9       SOS    609        56775
10      EOI    57384      2

Basically, then, a JPG (which can have a jpg or jpeg file extension) has a straightforward structure. But what about an MPO file that contains a left and a right view? Well, below is the listing for one to answer that. It’s pretty similar to the JPG file, with the addition of some bytes unrelated to the actual image in a couple of places (which we can ignore for now). The main difference is that the JPG markers are repeated after the EOI marker. That’s because MPO files are basically containers for JPEG-based images. Many programs will open the MPO file as a single JPEG image, because they just read to the first EOI. So if we want to extract the second image we just need to read the file from the second SOI (see the example in Photos3D). And for completeness, markers 1 and 10 contain EXIF data, one for each of the left/right views.

File: abbeystones.mpo
Bytes found: 3137536

Marker  Tag    Position   Length
------  ----   --------   ------
0       SOI    0          2
1       APP1   2          50494
2       APP2   50496      194
3       DQT    50690      134
4       SOF0   50824      19
5       DHT    50843      420
6       DRI    51263      6
7       SOS    51269      1511527
8       EOI    1562796    2
        BYTES  1562798    338
9       SOI    1563136    2
10      APP1   1563138    50590
11      APP2   1613728    98
12      DQT    1613826    134
13      SOF0   1613960    19
14      DHT    1613979    420
15      DRI    1614399    6
16      SOS    1614405    1523025
17      EOI    3137430    2
        BYTES  3137432    104

Another type of 3D photo file we may encounter, although much less often with modern stereo cameras, is the JPS file. I’ve put the structure of one below and it’s pretty much a JPG file: as expected because it contains two views stitched into a single image. As before, the APP1 section contains EXIF data. However, the JPS format includes additional data about the stereo image and its’ uses, which may explain the two APP15 sections, which are quite unusual (you can use the jpegtool library, in Photos3D, to extract and investigate them further if you like).

File: test.jps

Bytes found: 1374613

Marker  Tag    Position   Length
------  ----   --------   ------
0       SOI    0          2
1       APP1   2          6395
2       APP15  6397       65534
3       APP15  71931      58554
4       DQT    130485     134
5       DHT    130619     420
6       SOF0   131039     19
7       SOS    131058     1243553
8       EOI    1374611    2

Finally, let’s look at a 360-degree VR image taken with an Insta360 One RS camera, just to see how they differ from normal JPG images. In fact, as you probably expected, it looks a lot like a normal JPEG structure, but it’s nice to be sure. And markers 3 and 4 contain EXIF and XMP data respectively.

File: vr360.jpg
Bytes found: 4727559

Marker  Tag    Position   Length
------  ----   --------   ------
0       SOI    0          2
1       APP0   2          18
2       APP2   20         3162
3       APP1   3182       10448
4       APP1   13630      64987
5       DQT    78617      69
6       DQT    78686      69
7       SOF0   78755      19
8       DHT    78774      33
9       DHT    78807      183
10      DHT    78990      33
11      DHT    79023      183
12      SOS    79206      4648351
13      EOI    4727557    2

And that’s it: analysing JPEG-based file structures in pure Python 3 is relatively straightforward using the Photos3D library. Of course, it leads us onto wanting to do other things, such as analysing files from different cameras and extracting EXIF, XMP and Extended-XMP data from the APP sections, but we’ll cover that in other blog posts 🙂