The Python programming language allows us to do some pretty exciting things with our 3D images, as hopefully the Photos3D library illustrates. It even allows us to delve into the low-level bytes of the files to work out ways to access the data they hold. But often using Python means installing libraries that are based on other programming languages and even have external program dependencies we have to install seperately. That’s OK in principle, but not all platforms have those languages or programs available. So I decided to write some pure Python code for analysing JPEG-based images, such as JPG, JPEG, JPS and MPO files, which you can download from the Photos3D Github repo.
So you can understand how the Python code works it’s important to understand how JPEG files are structured. Basically they have two-byte markers that split the file into a number of sections. I’ve put some of those markers below, taken from Wikipedia, but you can find more on the jpegdump.c webpage. You’ll notice they’re all two bytes long and start with 0xFF in hexadecimal. In fact, the JPEG standard ensures that the actual image doesn’t have that value byte in the data, and other markers tell us how long each section is, so we can use that information to work through the file to find all of the sections.
Name | Bytes | Description |
---|---|---|
SOI | 0xFF, 0xD8 | Start Of Image |
DHT | 0xFF, 0xC4 | Define Huffman Table(s) |
DQT | 0xFF, 0xDB | Define Quantization Table(s) |
SOS | 0xFF, 0xDA | Start Of Scan |
RSTn | 0xFF, 0xDn (n=0..7) | Restart (in image data only) |
APPn | 0xFF, 0xEn | Application-specific |
COM | 0xFF, 0xFE | Comment |
EOI | 0xFF, 0xD9 | End Of Image |
So, armed with the Photos3D library from Github, we can try analysing some image files by running ‘python jpegdump.py‘ in our terminal, which uses the jpegtool library module. You can edit jpegdump.py to change the image that’s being analysed, and below I’ve used it with one of the sample JPG files in the testimages folder. As expected it starts and ends with SOI and EOI markers. It also has an APP0 section which is where the EXIF data is stored (as jpegdump.py tells us at the bottom of its’ output). Markers 2 through 8 are image-related data we can ignore for now, and the SOS marker shows where the actual image bytes are located.
File: lefttest.jpg
Bytes found: 57386
Marker Tag Position Length
------ ---- -------- ------
0 SOI 0 2
1 APP0 2 18
2 DQT 20 69
3 DQT 89 69
4 SOF0 158 19
5 DHT 177 33
6 DHT 210 183
7 DHT 393 33
8 DHT 426 183
9 SOS 609 56775
10 EOI 57384 2
Basically, then, a JPG (which can have a jpg or jpeg file extension) has a straightforward structure. But what about an MPO file that contains a left and a right view? Well, below is the jpegdump.py listing for one to answer that. It’s pretty similar to the JPG file, with the addition of some bytes unrelated to the actual image in a couple of places (which we can ignore for now). The main difference is that the JPG markers are repeated after the EOI marker. That’s because MPO files are basically containers for JPEG-based images. Many programs will open the MPO file as a single JPEG image, because they just read to the first EOI. So if we want to extract the second image we just need to read the file from the second SOI (see the jpegsplit.py example in Photos3D). And for completeness, markers 1 and 10 contain EXIF data, one for each of the left/right views.
File: abbeystones.mpo
Bytes found: 3137536
Marker Tag Position Length
------ ---- -------- ------
0 SOI 0 2
1 APP1 2 50494
2 APP2 50496 194
3 DQT 50690 134
4 SOF0 50824 19
5 DHT 50843 420
6 DRI 51263 6
7 SOS 51269 1511527
8 EOI 1562796 2
BYTES 1562798 338
9 SOI 1563136 2
10 APP1 1563138 50590
11 APP2 1613728 98
12 DQT 1613826 134
13 SOF0 1613960 19
14 DHT 1613979 420
15 DRI 1614399 6
16 SOS 1614405 1523025
17 EOI 3137430 2
BYTES 3137432 104
Another type of 3D photo file we may encounter, although much less often with modern stereo cameras, is the JPS file. I’ve put the structure of one below and it’s pretty much a JPG file: as expected because it contains two views stitched into a single image. As before, the APP1 section contains EXIF data. However, the JPS format includes additional data about the stereo image and its’ uses, which may explain the two APP15 sections, which are quite unusual (you can use the jpegtool library, in Photos3D, to extract and investigate them further if you like).
File: test.jps
Bytes found: 1374613
Marker Tag Position Length
------ ---- -------- ------
0 SOI 0 2
1 APP1 2 6395
2 APP15 6397 65534
3 APP15 71931 58554
4 DQT 130485 134
5 DHT 130619 420
6 SOF0 131039 19
7 SOS 131058 1243553
8 EOI 1374611 2
Finally, let’s look at a 360-degree VR image taken with an Insta360 One RS camera, just to see how they differ from normal JPG images. In fact, as you probably expected, it looks a lot like a normal JPEG structure, but it’s nice to be sure. And markers 3 and 4 contain EXIF and XMP data respectively.
File: vr360.jpg
Bytes found: 4727559
Marker Tag Position Length
------ ---- -------- ------
0 SOI 0 2
1 APP0 2 18
2 APP2 20 3162
3 APP1 3182 10448
4 APP1 13630 64987
5 DQT 78617 69
6 DQT 78686 69
7 SOF0 78755 19
8 DHT 78774 33
9 DHT 78807 183
10 DHT 78990 33
11 DHT 79023 183
12 SOS 79206 4648351
13 EOI 4727557 2
And that’s it: analysing JPEG-based file structures in pure Python 3 is relatively straightforward using the Photos3D library. Of course, it leads us onto wanting to do other things, such as analysing files from different cameras and extracting EXIF, XMP and Extended-XMP data from the APP sections, but we’ll cover that in other blog posts 🙂