cbloom rants


Oodle Data Compression Integration for Unreal Engine 4

We have created an integration of Oodle Data Compression for Unreal Engine 4. You can now use Oodle's amazing Kraken, Leviathan, and other codecs to compress your Unreal game data smaller and load it faster.

Once installed in your source tree, the Oodle Data Compression integration is transparent. Simply create compressed pak files the way you usually do, and instead of compressing with zlib they will be compressed with Oodle. At runtime, the engine automatically decodes with Oodle or zlib as specified in the pak.

Oodle can compress game data much smaller than zlib. Oodle also decodes faster than zlib. With less data to load from disk and faster decompression, you speed up data loading in two ways.

For example, on the ShooterGame sample game, the pak file sizes are :


uncompressed :       1,131,939,544 bytes

Unreal default zlib :  417,951,648   2.708 : 1

Oodle Kraken 4 :       372,845,225   3.036 : 1

Oodle Leviathan 8 :    330,963,205   3.420 : 1

Oodle Leviathan makes the ShooterGame pak file 87 MB smaller than the Unreal default zlib compression!

NOTE this is new and separate from the Oodle Network integration which has been in Unreal for some time. Oodle Network provides compression of network packets to reduce bandwidth in networked games.

The Oodle Data Compression integration is currently for Unreal 4.18 , and yes we'll be working on a 4.19 version soon.

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.


Oodle 2.6.0 : Some more perf comparisons

Here are some more perf comparisons with Oodle 2.6.0, this time with more non-Oodle compressors for reference.

As before, this is Windows x64 on a Core i7-3770, and we're looking at compression ratio vs. decode speed, not considering encode speed, and running all compressors in their highest compression level.

On the "seven" testset, compressing each file independently, then summing decode time and size for each file :

The loglog chart shows log compression ratio on the Y axis and log decode speed on the X axis (the numeric labels show the pre-log values). The upper right is the Pareto frontier.

The raw numbers are :

total      : lzma 16.04 -9           : 3.186 to 1 :   52.84 MB/s
total      : lzham 1.0 -d26 -4       : 2.932 to 1 :  149.57 MB/s
total      : brotli24 2017-12-12 -11 : 2.958 to 1 :  203.91 MB/s
total      : zlib 1.2.11 -9          : 2.336 to 1 :  271.61 MB/s
total      : zstd 1.3.3 -22          : 2.750 to 1 :  474.47 MB/s
total      : lz4hc 1.8.0 -12         : 2.006 to 1 : 2786.78 MB/s
total      : Leviathan8              : 3.251 to 1 :  675.92 MB/s
total      : Kraken8                 : 3.097 to 1 :  983.74 MB/s
total      : Mermaid8                : 2.846 to 1 : 1713.53 MB/s
total      : Selkie8                 : 2.193 to 1 : 3682.88 MB/s

Isolating just Kraken, Leviathan, and ZStd on the same test (ZStd is the closest non-Oodle codec), we can look at file-by-file performance :

The loglog shows each file with a different symbol, colored by the compressor.

The speed advantage of Oodle is pretty uniform, but the compression advantage varies by file type. Some files simply have more air that Oodle can squeeze out beyond what other LZ's find. For example if you only looked at enwik7 (xml/text), then all of the modern LZ's considered here (ZStd,Oodle,Brotli,LZHAM) would get almost exactly the same compression; there's just not a lot of room for them to differentiate themselves on text.

Runs on a couple other standard files :

On the Silesia/Mozilla file :

mozilla    : lzma 16.04 -9           : 3.832 to 1 :   63.79 MB/s
mozilla    : lzham 1.0 -d26 -4       : 3.570 to 1 :  168.96 MB/s
mozilla    : brotli24 2017-12-12 -11 : 3.601 to 1 :  246.68 MB/s
mozilla    : zlib 1.2.11 -9          : 2.690 to 1 :  275.40 MB/s
mozilla    : zstd 1.3.3 -22          : 3.365 to 1 :  503.44 MB/s
mozilla    : lz4hc 1.8.0 -12         : 2.327 to 1 : 2509.82 MB/s
mozilla    : Leviathan8              : 3.831 to 1 :  691.37 MB/s
mozilla    : Kraken8                 : 3.740 to 1 :  985.35 MB/s
mozilla    : Mermaid8                : 3.335 to 1 : 1834.49 MB/s
mozilla    : Selkie8                 : 2.793 to 1 : 3145.63 MB/s

On win81 :

win81      : lzma 16.04 -9           : 2.922 to 1 :   51.87 MB/s
win81      : lzham 1.0 -d26 -4       : 2.774 to 1 :  156.44 MB/s
win81      : brotli24 2017-12-12 -11 : 2.815 to 1 :  214.02 MB/s
win81      : zlib 1.2.11 -9          : 2.207 to 1 :  253.68 MB/s
win81      : zstd 1.3.3 -22          : 2.702 to 1 :  472.39 MB/s
win81      : lz4hc 1.8.0 -12         : 1.923 to 1 : 2408.91 MB/s
win81      : Leviathan8              : 2.959 to 1 :  757.36 MB/s
win81      : Kraken8                 : 2.860 to 1 :  949.07 MB/s
win81      : Mermaid8                : 2.618 to 1 : 1847.77 MB/s
win81      : Selkie8                 : 2.142 to 1 : 3467.36 MB/s

And again on the "seven" testset, but this time with the files cut into 32 kB chunks :

total      : lzma 16.04 -9           : 2.656 to 1 :   43.25 MB/s
total      : lzham 1.0 -d26 -4       : 2.435 to 1 :   76.36 MB/s
total      : brotli24 2017-12-12 -11 : 2.581 to 1 :  178.25 MB/s
total      : zlib 1.2.11 -9          : 2.259 to 1 :  255.18 MB/s
total      : zstd 1.3.3 -22          : 2.363 to 1 :  442.42 MB/s
total      : lz4hc 1.8.0 -12         : 1.849 to 1 : 2717.30 MB/s
total      : Leviathan8              : 2.731 to 1 :  650.23 MB/s
total      : Kraken8                 : 2.615 to 1 :  975.69 MB/s
total      : Mermaid8                : 2.455 to 1 : 1625.69 MB/s
total      : Selkie8                 : 1.910 to 1 : 4097.27 MB/s

By cutting into 32 kB chunks, we remove the window size disadvantage suffered by zlib and LZ4. Now all the codecs have the same match window, and the compression difference only comes from what additional features they provide. The small chunk also stresses startup overhead time and adaptation speed.

The Oodle codecs generally do even better (vs the competition) on small chunks than they do on large files. For example LZMA and LZHAM both have large models that really need a lot of data to get up to speed. All the non-Oodle codecs slow down more on small chunks than the Oodle codecs do.

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK

Oodle 2.6.0 : Hydra and space-speed flexibility

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

One of the unique things about Oodle is the fact that its compressors are optimizing for a space-speed goal (not just for size), and the user has control over how that combined score is priced.

This is a crucial aspect of Leviathan. Oodle Leviathan considers many options for the encoded data, it rates those options for space-speed and chooses the bit stream that optimizes that goal. This means that Leviathan can consider slower-to-decode bit stream options, and only use them when they provide enough of a benefit that they are worth it.

That is, Leviathan decompression speed varies a bit depending on the file, as all codecs do. However, other codecs will sometimes get slower for no particular benefit. They may choose a slower mode, or simply take a lot more slow encoding options (short matches, or frequent literal-match switches), even if isn't a big benefit to file size. Leviathan only chooses slower encoding modes when they provide a benefit to file size that meets the goal the user has set for space-speed tradeoff.

Each of the new Oodle codecs (Leviathan, Kraken, Mermaid & Selkie) has a different default space-speed goal, which we set to be near their "natural lambda", the place that their fundamental structure works best. Clients can dial this up or down to bias their decisions more for size or speed.

The flexibility of these Oodle codecs allows them to cover a nearly continuous range of compression ratio vs decode speed points.

Here's an example of the Oodle codecs targeting a range of space-speed goals on the file "TC_Foreground_Arc.clb" (Windows x64):

Now you may notice that at the highest compression setting of Kraken (-zs64) it is strictly worse than the fastest setting of Leviathan (-zs1024). If you want that speed-compression tradeoff point, then using Kraken is strictly wrong - you should switch to Leviathan there.

That's what Oodle Hydra does for you. Hydra (the many headed beast) is a meta compressor which chooses between the other Oodle compressors to make the best space-speed encoding. Hydra does not just choose per file, but per block, which means it can do finer grain switching to find superior encodings.

Oodle Hydra on the file "TC_Foreground_Arc.clb" (Windows x64):

When using Hydra you don't explicitly choose the encoder at all. You set your space-speed goal and you trust in Oodle to make the choice for you. It may use Leviathan, Kraken, or Mermaid, so you may get faster or slower decoding on any given chunk, but you do know that when it chooses a slower decoder it was worth it. Hydra also sometimes gets free wins; if you wanted high compression so you would've gone with Leviathan, there are cases where Kraken compresses nearly the same (or even does better), and switching down to Kraken is just a decode speed win for free (no compression ratio sacrificed).

Of course the disadvantage of Hydra is slow encoding, because it has to consider even more encoding options than Oodle already does. It is ideal for distribution scenarios where you encode once and decode many times.

Another way we can demonstrate Oodle's space-speed encoding is to disable it.

We can run Oodle in "max compression only" mode by setting the price of time to zero. That is, when it scores a decision for space-speed, we consider only speed. (I never actually set the price of time to exactly zero; it's better to just make it very small so that ties in size are still broken in favor of speed; specifically we will set spaceSpeedTradeoffBytes = 1).

Here's Oodle Leviathan on the Silesia standard test corpus :

Leviathan default space-speed score (spaceSpeedTradeoffBytes = 256) :

total   : 211,938,580 ->48,735,197 =  1.840 bpb =  4.349 to 1
decode  : 264.501 millis, 4.25 c/B, rate= 801.28 MB/s

Leviathan max compress (spaceSpeedTradeoffBytes = 1) :

total   : 211,938,580 ->48,592,540 =  1.834 bpb =  4.362 to 1
decode  : 295.671 millis, 4.75 c/B, rate= 716.81 MB/s
By ignoring speed in decisions, we've gained 140 kB , but have lost 30 milliseconds in decode time.

Perhaps a better way to look at it is the other way around - by making good space-speed decisions, the default Leviathan setting saves 0.50 cycles per byte in decode time, at a cost of only 0.006 bits per byte of compressed size.

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK


Oodle 2.6.0 : Leviathan Rising

We're pleased to announce the release of Oodle 2.6.0 featuring the new Leviathan codec. Leviathan achieves high compression ratios (comparable to 7z/LZMA) with unparalleled fast decompression.

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

Almost two years ago, we released Oodle Kraken. Kraken roared onto the scene with high compression ratios and crazy fast decompression (over 1 GB/s on a Core i7 3770). The performance of Kraken immediately made lots of codecs obsolete. We could also see that something was wrong in the higher compression domain.

Kraken gets high compression (much more that zlib; usually more than things like RAR, ZStd and Brotli), but it gets a little less than 7z/LZMA. But to get that small step up in compression ratio, you had to accept a 20X decode speed penalty.

For example, on the "seven" testset (a private test set with a variety of data types) :

LZMA-9     :  3.18:1 ,    3.0 enc MB/s ,   53.8 dec MB/s
Kraken-7   :  3.09:1 ,    1.3 enc MB/s ,  948.5 dec MB/s
zlib-9     :  2.33:1 ,    7.7 enc MB/s ,  309.0 dec MB/s
LZMA gets 3% more compression than Kraken, but decodes almost 20X slower. That's not the right price to pay for that compression gain. There had to be a better way to take Kraken's great space-speed performance and extend it into higher compression without giving up so much speed.

It's easy to see that there was a big gap in a plot of decode speed vs compression ratio :

(this is a log-log plot on the seven test set, on a Core i7 3770. We're not looking at encode speed here at all, we're running the compressors in their max compression mode and we care about the tradeoff of size vs decode speed.)

We've spent the last year searching in that mystery zone and we have found the great beast that lives there and fills the gap : Leviathan.

Leviathan gets to high compression ratios with the correct amount of speed decrease from Kraken (about 33%). This means that even with better than LZMA ratios on the seven test set, it is still over 2X faster to decode than zlib :

Leviathan-7:  3.23:1 ,    0.9 enc MB/s ,  642.4 dec MB/s

LZMA-9     :  3.18:1 ,    3.0 enc MB/s ,   53.8 dec MB/s
Kraken-7   :  3.09:1 ,    1.3 enc MB/s ,  948.5 dec MB/s
zlib-9     :  2.33:1 ,    7.7 enc MB/s ,  309.0 dec MB/s

Leviathan doesn't always beat LZMA's compression ratio (they have slightly different strengths) but they are comparable. Leviathan is 7-20X faster to decompress than LZMA (usually around 10X).

Leviathan is ideal for distribution use cases, in which you will compress once and decompress many times. Leviathan allows you to serve highly compressed data to clients without slow decompression. We do try to keep Leviathan encode's time reasonable, but it is not a realtime encoder and not the right answer where fast encoding is needed.

Leviathan is a game changer. It makes high ratio decompression possible where it wasn't before. It can be used on video game consoles and mobile devices where other decoders took far too long. It can be used for in-game loading where CPU use needs to be minimized. It can be used to keep data compressed on disk with no install, because its decompression is fast enough to do on every load.

Leviathan now makes up a part of the Oodle Kraken-Mermaid-Selkie family. These codecs now provide excellent solutions over a wider range of compression needs.

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK

Oodle 2.6.0 : Everything new and tasty

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

A quick run down of all the exciting new stuff in Oodle 2.6.0 :

1. Leviathan!
2. The Kraken, Mermaid & Selkie fast-level encoders are now much faster.
3. Kraken & Mermaid's optimal level encoders now get more compression.
4. Kraken & Mermaid have new bit stream options which allow them to reach even higher compression.
5. Kraken and Mermaid are now more tuneable to different compression ratios and decode speeds.

1. Leviathan!

Read the Leviathan detailed performance report.

2. The Kraken, Mermaid & Selkie fast-level encoders are now much faster.

The non-optimal-parsing encoder levels that are intended for realtime use in Oodle are levels 1-4 aka SuperFast, VeryFast, Fast & Normal.

Their decode speed was always best in class, but previously their encode speed was slightly off the Pareto frontier (the best possible trade off of encode speed vs compression ratio). No longer.

"win81" test file (Core i7-3770 Windows x64)

Oodle 255 :

Kraken1   :  2.33:1 ,   83.3 enc MB/s ,  911.8 dec MB/s
Kraken2   :  2.39:1 ,   68.5 enc MB/s ,  938.0 dec MB/s
Kraken3   :  2.51:1 ,   24.4 enc MB/s , 1005.5 dec MB/s
Kraken4   :  2.57:1 ,   17.3 enc MB/s ,  997.3 dec MB/s

Oodle 260 :

Kraken1   :  2.33:1 ,  135.1 enc MB/s ,  928.1 dec MB/s
Kraken2   :  2.41:1 ,   94.0 enc MB/s ,  937.4 dec MB/s
Kraken3   :  2.52:1 ,   38.3 enc MB/s , 1020.6 dec MB/s
Kraken4   :  2.60:1 ,   23.0 enc MB/s , 1022.9 dec MB/s

Oodle 255 :

Mermaid1  :  2.10:1 ,  106.4 enc MB/s , 2079.4 dec MB/s
Mermaid2  :  2.15:1 ,   78.9 enc MB/s , 2161.1 dec MB/s
Mermaid3  :  2.24:1 ,   26.4 enc MB/s , 2294.0 dec MB/s
Mermaid4  :  2.29:1 ,   26.6 enc MB/s , 2341.1 dec MB/s

Oodle 260 :

Mermaid1  :  2.14:1 ,  161.1 enc MB/s , 2012.1 dec MB/s
Mermaid2  :  2.22:1 ,  104.1 enc MB/s , 2007.7 dec MB/s
Mermaid3  :  2.29:1 ,   39.0 enc MB/s , 2181.7 dec MB/s
Mermaid4  :  2.32:1 ,   32.1 enc MB/s , 2294.0 dec MB/s

Oodle 255 :

Selkie1   :  1.76:1 ,  146.8 enc MB/s , 3645.0 dec MB/s
Selkie2   :  1.85:1 ,  100.5 enc MB/s , 3565.0 dec MB/s
Selkie3   :  1.98:1 ,   28.2 enc MB/s , 3533.2 dec MB/s
Selkie4   :  2.04:1 ,   28.9 enc MB/s , 3675.9 dec MB/s

Oodle 260 :

Selkie1   :  1.78:1 ,  181.7 enc MB/s , 3716.0 dec MB/s
Selkie2   :  1.87:1 ,  114.7 enc MB/s , 3595.7 dec MB/s
Selkie3   :  1.98:1 ,   42.1 enc MB/s , 3653.1 dec MB/s
Selkie4   :  2.02:1 ,   34.9 enc MB/s , 3818.3 dec MB/s
The speed of the fastest encoder (level 1 = "SuperFast") is up by about 60% in Kraken & Mermaid.

Kraken's encode speed vs ratio is now competitive with ZStd, which has long been the best codec for encode speed tradeoff. For example, matching Kraken1 to the closest comparable ZStd levels on the same machine :

at similar encode speed, you can compare the compression ratios :

Kraken1       :  2.33:1 ,  135.1 enc MB/s ,  928.1 dec MB/s
zstd 1.3.3 -4 :  2.22:1 ,  136   enc MB/s ,  595   dec MB/s

or you can look at equal file sizes and compare the encode speed :

Kraken1       :  2.33:1 ,  135.1 enc MB/s ,  928.1 dec MB/s
zstd 1.3.3 -6 :  2:33:1 ,   62   enc MB/s ,  595   dec MB/s
Of course ZStd does have faster encode levels (1-3); Oodle does not provide anything in that domain.

3. Kraken & Mermaid's optimal level encoders now get more compression. (even with 2.5 compatible bit streams)

We improved the ability of the optimal parse encoders to make good decisions and find smaller encodings. This is at level 8 (Optimal4) our maximum compression level with slow encoding.

PD3D : (public domain 3D game data test set)

Kraken8 255    :  3.67:1 ,    2.8 enc MB/s , 1091.5 dec MB/s
Kraken8 260 -v5:  3.72:1 ,    1.2 enc MB/s , 1079.9 dec MB/s

GTS : (private game data test set)

Kraken8 255    :  2.60:1 ,    2.5 enc MB/s , 1335.8 dec MB/s
Kraken8 260 -v5:  2.63:1 ,    1.2 enc MB/s , 1343.8 dec MB/s

Silesia : (standard Silesia compression corpus)

Kraken8 255    :  4.12:1 ,    1.4 enc MB/s ,  982.0 dec MB/s
Kraken8 260 -v5:  4.18:1 ,    0.6 enc MB/s , 1018.7 dec MB/s

(speeds on Core i7-3770 Windows x64)
(-v5 means encode in v5 (2.5.x) backwards compatibility mode)
Compression ratio improvements around 1% might not sound like much, but when you're already on the Pareto frontier, finding another 1% without sacrificing any decode speed or changing the bit stream is quite significant.

4. Kraken & Mermaid have new bit stream options which allow them to reach even higher compression.

PD3D :

Kraken8 255    :  3.67:1 ,    2.8 enc MB/s , 1091.5 dec MB/s
Kraken8 260 -v5:  3.72:1 ,    1.2 enc MB/s , 1079.9 dec MB/s
Kraken8 260    :  4.00:1 ,    1.0 enc MB/s , 1034.7 dec MB/s


Kraken8 255    :  2.60:1 ,    2.5 enc MB/s , 1335.8 dec MB/s
Kraken8 260 -v5:  2.63:1 ,    1.2 enc MB/s , 1343.8 dec MB/s
Kraken8 260    :  2.67:1 ,    1.0 enc MB/s , 1282.3 dec MB/s

Silesia :

Kraken8 255    :  4.12:1 ,    1.4 enc MB/s ,  982.0 dec MB/s
Kraken8 260 -v5:  4.18:1 ,    0.6 enc MB/s , 1018.7 dec MB/s
Kraken8 260    :  4.24:1 ,    0.6 enc MB/s ,  985.4 dec MB/s
Kraken in Oodle 2.6.0 now gets Silesia to 50,006,565 bytes at the default space-speed tradeoff target. Kraken in max-compression space-speed setting gets Silesia to 49,571,429 bytes (and is still far faster to decode than anything close).

If we look back at where Kraken started in April of 2016 , it was getting 4.05 to 1 on Silesia , now 4.24 to 1.

Kraken now usually gets more compression than anything remotely close to its decode speed.

Looking back at the old Performance of Oodle Kraken , Kraken only got 2.70:1 on win81. On some files, Kraken has always out-compressed the competition, but win81 was one where it lagged. It does better now :

"win81" test file (Core i7-3770 Windows x64)
in order of decreasing decode speed :

Kraken8       :  2.86:1 ,    0.6 enc MB/s ,  950.9 dec MB/s
old Kraken 215:  2.70:1 ,    1.0 enc mb/s ,  877.0 dec mb/s
Leviathan8    :  2.96:1 ,    0.4 enc MB/s ,  758.1 dec MB/s

zstd 1.3.3 -22           3.35 enc MB/s   473 dec MB/s    38804423  37.01 win81 = 2.702:1
zlib 1.2.11 -9           8.59 enc MB/s   254 dec MB/s    47516720  45.32 win81 = 2.206:1
brotli24 2017-12-12 -11  0.39 enc MB/s   214 dec MB/s    37246857  35.52 win81 = 2.815:1
lzham 1.0 -d26 -4        1.50 enc MB/s   158 dec MB/s    37794766  36.04 win81 = 2.775:1
lzma 16.04 -9            2.75 enc MB/s    51 dec MB/s    35890919  34.23 win81 = 2.921:1
At the time of Kraken's release, it was a huge decode speed win vs comparable compressors, but it sometimes lagged a bit in compression ratio. No longer.

NOTE : Oodle 2.6.0 by default makes bit streams that are decodable by version >= 2.6.0 only. If you need bit streams that can be read by earlier versions, you must set the backward compatible version number that you need. See the Oodle FAQ on backward compatibility.

5. Kraken and Mermaid are now more tuneable to different compression ratios and decode speeds.

The new v6 bit stream has more options, which allows them to smoothly trade off compression ratio for decode speed. The user can set this goal with a space-speed tradeoff parameter.

All the Oodle codecs have a compression level setting (similar to the familiar zip 1-9 level) that trades encode time for decode speed. Unlike many other codecs, Oodle's compressors do not lose *decode* speed at higher encode effort levels. We are not finding more compact encodings by making the decoder slower. Instead you can dial decode speed vs ratio with a separate parameter that changes how the encoder scores decisions.

More about this in : Oodle Hydra and space-speed flexibility.

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK

Oodle 2.6.0 : Leviathan performance on PS4, Xbox One and Switch

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

Oodle's speed on the Sony PS4 (and Microsoft Xbox One) and Nintendo Switch is superb. With the slower processors in these consoles (compared to a modern PC), the speed advantage of Oodle makes a big difference in total load time or CPU use.

These are run on the private test file "lzt99". I'm mainly looking at the speed numbers here, not the compression ratio (compression wise, we do so well on lzt99 that it's a bit silly, and also not entirely fair to the competition).

On the Nintendo Switch (clang ARM-A57 AArch64 1.02 GHz) :

Oodle 2.6.0 -z8 :

Leviathan   : 2.780 to 1 : 205.50 MB/s
Kraken      : 2.655 to 1 : 263.54 MB/s
Mermaid     : 2.437 to 1 : 499.72 MB/s
Selkie      : 1.904 to 1 : 957.60 MB/s

zlib from nn_deflate

zlib        : 1.883 to 1 :  74.75 MB/s

And on the Sony PS4 (clang x64 AMD Jaguar 1.6 GHz) :

Oodle 2.6.0 -z8 :

Leviathan   : 2.780 to 1 : 271.53 MB/s
Kraken      : 2.655 to 1 : 342.49 MB/s
Mermaid     : 2.437 to 1 : 669.34 MB/s
Selkie      : 1.904 to 1 :1229.26 MB/s

non-Oodle reference (2016) :

brotli-11   : 2.512 to 1 :  77.84 MB/s
miniz       : 1.883 to 1 :  85.65 MB/s
brotli-9    : 2.358 to 1 :  95.36 MB/s
zlib-ng     : 1.877 to 1 : 109.30 MB/s
zstd        : 2.374 to 1 : 133.50 MB/s
lz4hc-safe  : 1.669 to 1 : 673.62 MB/s
LZSSE8      : 1.626 to 1 : 767.11 MB/s

The Microsoft XBox One has similar performance to the PS4. Mermaid & Selkie can decode faster than the hardware DMA compression engine in the PS4 and Xbox One, and usually compress more if they aren't limited to small chunks like the hardware DMA engine needs.

Note that the PS4 non-Oodle reference data is from my earlier runs back in 2016 : Oodle Mermaid and Selkie on PS4 and PS4 Battle : MiniZ vs Zlib-NG vs ZStd vs Brotli vs Oodle . They should be considered only rough reference points; I imagine some of those codecs are slightly different now, but does even a 10 or 20 or 50% improvement really make much difference? (also note that there's no true zlib reference in that PS4 set; miniz is close but a little different, and zlib-ng is faster than standard zlib).

Leviathan is in a different compression class than any of the other options, and is still 2-3X faster than zlib.

Something I spotted while gathering the old numbers that I think is worth talking about:

If you look at the old Kraken PS4 numbers from Oodle 2.3.0 : Kraken Improvement you would see :

PS4 lzt99

old :

Oodle 2.3.0 -z6 : 2.477 to 1 : 389.28 MB/s
Oodle 2.3.0 -z7 : 2.537 to 1 : 363.70 MB/s

vs new :

Oodle 2.6.0 -z8 : 2.655 to 1 : 342.49 MB/s

(-z8 encode level didn't exist back then)

Oh no! Oodle's gotten slower to decode!

Well no, it hasn't. But this is a good example of how looking at just space or speed on their own can be misleading.

Oodle's encoders are always optimizing for a space-speed goal. There are a range of solutions to that problem which have nearly the same space-speed score, but have different sizes or speeds.

So part of what's happened here is that Oodle 2.6.0 is just hitting a slightly different spot in the space-speed solution space than Oodle 2.3.0 is. It's finding a bit stream that is smaller, and trades off some decode speed for that. With its space-speed cost model, it measures that tradeoff as being a good value. (the user can set the relative value of time & bytes that Oodle uses in its scoring via the spaceSpeedTradeoffBytes parameter).

But something else has also happened - Oodle 2.6.0 has just gotten much better. It hasn't just stepped along the Pareto curve to a different but equally good solution - it has stepped perpendicularly to the old Pareto curve and is finding better solutions.

At RAD we measure that using the "correct" Weissman score which provides a way of combining a space-speed point into a single number that can be used to tell whether you have made a real Pareto improvement or just a tangential step.

The easiest way to see that you have definitely made an improvement is to run Oodle 2.6.0 with a different spaceSpeedTradeoffBytes price so that it provides a simpler relationship :

PS4 lzt99

new, with spaceSpeedTradeoffBytes = 1024

Oodle 2.6.0 -z8 : 2.495 to 1 : 445.56 MB/s

vs old :

Oodle 2.3.0 -z6 : 2.477 to 1 : 389.28 MB/s

Now we have higher compression and higher speed, so there's no question of whether we lost anything.

In general the Oodle 2.6.0 Kraken & Mermaid encoders are making decisions that slightly bias for higher compression (and slower decode; though often the decode speed is very close) than before 2.6.0. If you find you've lost a little decode speed and want it back, increase spaceSpeedTradeoffBytes (try 400).

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK

Oodle 2.6.0 : Leviathan detailed performance report

Oodle is an SDK for high performance lossless data compression. For more about Oodle, or licensing inquiries, visit the RAD Game Tools web site. This is my personal blog where I post supplemental material about Oodle.

Let's have a look at a few concrete tests of Leviathan's performance.

All of the tests in this post are run on Windows, x64, on a Core i7-3770. For performance on some other game platforms, see : Leviathan performance on PS4, Xbox One, and Switch .

All the compressors in this test are run on in their slowest encoding level. We're looking for high compression and fast decompression, we're not looking for fast compression here. See @@ for a look at encode times.

Tests are on groups of files. The results show the total compressed size & total decode time over the group of files, but each file is encoded independently. The maximum window size is used for all compressors.

The compressors tested here are :

Kraken8          : Oodle Kraken at level 8 (Optimal4)
Leviathan8       : Oodle Leviathan at level 8 (Optimal4)

lzma_def9        : LZMA -mx9 with default settings (lc3lp0pb2fb32) +d29
lzmalp3pb3fb1289 : LZMA -mx9 with settings that are better on binary and game data (lc0lp3pb3fb128) +d29
zlib9            : zlib 1.2.8 at level 9
LZMA and zlib here were built with MSVC; I also checked there speed in a GCC build and confirmed it is nearly identical.

Two settings for LZMA are tested to try to give it the best chance of competing with Leviathan. LZMA's default settings tend to be good on text but not great on binary (often even Kraken can beat "lzma_def" on binary structured data). I've also increased LZMA's fast bytes to 128 in the non-default options, the default value of 32 is a bit of a detriment to compression ratio, and most of the competition (ZStd, Brotli, LZHAM) use a value more like 128. I want to give LZMA a great chance to compete; we don't need to play any games with selective testing to make Leviathan look good.

Let the tests begin!

PD3D :

Public Domain 3d game data test set :

Kraken8         :  4.02:1 ,    1.0 enc MB/s , 1013.3 dec MB/s
Leviathan8      :  4.21:1 ,    0.5 enc MB/s ,  732.8 dec MB/s
zlib9           :  2.89:1 ,    5.2 enc MB/s ,  374.5 dec MB/s
lzmalp3pb3fb1289:  4.08:1 ,    3.4 enc MB/s ,   65.4 dec MB/s
lzma_def9       :  3.97:1 ,    4.3 enc MB/s ,   65.4 dec MB/s

Game BCn :

A mix of BCn textures from a game (mostly BC1, BC4, and BC7) :

Kraken8         :  3.67:1 ,    0.7 enc MB/s ,  881.0 dec MB/s
Leviathan8      :  3.84:1 ,    0.4 enc MB/s ,  661.9 dec MB/s
zlib9           :  2.72:1 ,    6.2 enc MB/s ,  321.5 dec MB/s
lzmalp3pb3fb1289:  3.67:1 ,    2.0 enc MB/s ,   66.9 dec MB/s
lzma_def9       :  3.68:1 ,    2.3 enc MB/s ,   67.0 dec MB/s

ASTC testset :

A private corpus of game ASTC texture data :

Kraken8         :  1.15:1 ,    1.7 enc MB/s , 1289.6 dec MB/s
Leviathan8      :  1.17:1 ,    1.0 enc MB/s ,  675.8 dec MB/s
zlib9           :  1.10:1 ,   18.9 enc MB/s ,  233.8 dec MB/s
lzma_def9       :  1.17:1 ,    6.2 enc MB/s ,   21.4 dec MB/s
lzmalp3pb3fb1289:  1.19:1 ,    6.1 enc MB/s ,   21.4 dec MB/s

gametestset :

A private corpus of game test data :

Kraken8         :  2.68:1 ,    1.0 enc MB/s , 1256.2 dec MB/s
Leviathan8      :  2.83:1 ,    0.6 enc MB/s ,  915.0 dec MB/s
zlib9           :  1.99:1 ,    8.3 enc MB/s ,  332.4 dec MB/s
lzmalp3pb3fb1289:  2.76:1 ,    3.4 enc MB/s ,   46.5 dec MB/s
lzma_def9       :  2.72:1 ,    4.2 enc MB/s ,   46.3 dec MB/s

Silesia :

Silesia standard public compression test corpus :

Kraken8         :  4.25:1 ,    0.6 enc MB/s ,  996.0 dec MB/s
Leviathan8      :  4.35:1 ,    0.4 enc MB/s ,  804.5 dec MB/s
zlib9           :  3.13:1 ,    8.4 enc MB/s ,  351.3 dec MB/s
lzmalp3pb3fb1289:  4.37:1 ,    1.9 enc MB/s ,   80.2 dec MB/s
lzma_def9       :  4.27:1 ,    2.6 enc MB/s ,   79.0 dec MB/s

brotli24 -11    :  4.21:1 ,                    310.9 dec MB/s
zstd 1.3.3 -22  :  4.01:1 ,                    598.2 dec MB/s
(zstd and brotli run in lzbench)  (brotli is 2017-12-12)

Read more about Leviathan and Oodle 2.6.0 in these other posts on my blog :

Leviathan Rising
Everything new and tasty in Oodle 2.6.0
Leviathan performance on PS4, Xbox One, and Switch
Leviathan detailed performance report
Oodle Hydra and space-speed flexibility

or visit RAD to read for more information about the Oodle SDK


Oodle 2.6.0 : Impending Deprecation of pre-Kraken codecs

Starting in Oodle 2.6.0 the pre-Kraken codecs will be slowly phased out.

In 2.6.0 they will still be available for both encoding & decoding. However, they will be marked "deprecated" to discourage their use. I'm doing this two ways.

One : if you encode with them, they will log a warning through the new Oodle "UsageWarning" system. Decoding with them will not log a warning. This warning can be disabled by calling Oodle_SetUsageWarnings(false). (Of course in shipping you might also have set the Oodle log function to NULL via OodlePlugins_SetPrintf, which will also disable the warning log). (Also note that in MSVC builds of Oodle the default log function only goes to OutputDebugString so you will not see anything unless you either have a debugger attached, change the log function, or use OodleX to install the OodleX logger).

Two : the enum for the old compressor names will be hidden in oodle2.h by default. You must define OODLE_ALLOW_DEPRECATED_COMPRESSORS before including oodle2.h to enable their definition.

As noted, decoding with the old codecs will not log a usage warning, and can be done without setting OODLE_ALLOW_DEPRECATED_COMPRESSORS. That is, Oodle 2.6.0 requires no modifications to decode old codecs and will not log a warning. You only need to consider these steps if you want to *encode* with the old codecs.

In some future version the encoders for the old codecs will be removed completely. All decoders will continue to be shipped for the time being.

I'm intentionally make it difficult to encode with the old codecs so that you transition to one of the new codecs.

Long term support codecs :

Kraken, Mermaid, Selkie (and Hydra)

codecs being deprecated :


The new codecs simply obsolete the old codecs and they should not be used any more. The old codecs are also a mix of some fuzz safe, some not. The new codecs are all fuzz safe and we want to remove support for non-fuzz-safe decoding so that it's not possible to do by accident.

In pursuit of that last point, another change in 2.6.0 is removing the default argument value for OodleLZ_FuzzSafe in the OodleLZ_Decompress call.

Previously that argument had a default value of OodleLZ_FuzzSafe_No , so that it would allow all the codecs to decompress and was backwards compatible when it was introduced.

If you are not explicitly passing something there, you will get a compiler error and need to pass something.

If possible, you should pass OodleLZ_FuzzSafe_Yes. This will ensure the decode is fuzz safe (ie. won't crash if given invalid data to decode).

The only reason that you would not pass FuzzSafe_Yes is if you need to decode some of the older non-fuzz-safe codecs. We recommend moving away from those codecs to the new post-Kraken codecs (which are all fuzz safe).

Fuzz safe codecs :

Kraken, Mermaid, Selkie (and Hydra)
BitKnit, LZB16, LZH, LZHLW

Non-fuzz-safe codecs :


If you need to decode one of the non-fuzz-safe codecs, you must pass FuzzSafe_No. If you pass FuzzSafe_Yes, and the decoder encounters data made by that codec, it will return failure.

Fuzz safety in Oodle means that unexpected data will not crash the decoder. It will not necessarilly be detected; the decode might still return success. For full safety, your systems that consume the data post decompression must all be fuzz safe.

Another semi-deprecation coming in Oodle is removing high-performance optimization for 32-bit builds (where 64-bit is available).

What I mean is, we will continue to support & work in 32-bit, but it will no longer be an optimization priority. We may allow it to get slower than it could be. (for example in some cases we just run the 64-bit code that requires two registers per qword; not the ideal way to write the 32-bit version, but it saves us from making yet more code paths to optimize and test).

We're not aware of any games that are still shipping in 32-bit. We have decided to focus our time budget on 64-bit performance. We recommend evaluating Oodle and always running your tools in 64-bit.

If you're a user who believes that 32-bit performance is important, please let us know by emailing your contact at RAD.


The Natural Lambda

Every compressor has a natural innate or implicit "lambda" , the Lagrange parameter for space-speed optimization.

Excluding compressors like cmix that are purely aiming for the smallest size, every other compressor is balancing ratio vs. complexity in some way. The author has made countless decisions about space-speed tradeoffs in the design of their codec; are they using huffman or ANS or arithmetic coding? are they using LZ77 or ROLZ or PPM? are they optimal parsing? etc. etc.

There are always ways to spend more CPU time and get more compression, so you choose some balance that you are targeting and try to make decisions that optimize for that balance. (as usual when I talk about "time" or "speed" here I'm talking about decode time ; you can of course also look at balancing encode time vs. ratio, or memory use vs. ratio, or code size, etc.)

Assuming you have done everything well, you may produce a compressor which is "pareto optimal" over some range of speeds. That is if you do something like this : 03-02-15 - Oodle LZ Pareto Frontier , you make a speedup graph over various simulated disk speeds, you are on the pareto frontier if your compressor's curve is the topmost for some range. If your compressor is nowhere topmost, as in eg see the graph with zlib at the bottom : Introducing Oodle Mermaid and Selkie , then try again before proceeding.

The range over which your compressor is pareto optimal is the "natural range" for it. That is a range where it is working better than any other compressor in the world.

Outside of that range, the optimal way to use your compressor is to *not* use it. A client that wants optimal behavior outside of that range should simply switch to a different compressor.

As an example of this - people often take a compressor which is designed for a certain space-speed performance, and then ask it questions that are outside of its natural range, such as "well if I change the settings, just how small of a file can it produce?" , or "just how fast can it decode at maximum speed?". These are bogus questions because they are outside of the operating range. It's like asking how delicate an operation can you do with a hammer, or how much weight can you put on an egg. You're using the tool wrong and should switch to a different tool.

(perhaps a better analogy that's closer to home is taking traditional JPEG Huff and trying to use it for very low bit rates (as many ass-hat authors do when trying to show how much better they are than JPEG), like 0.1 bpp, or at very high bit rates trying to get near-lossless quality. It's simply not built to run outside of its target range, and any examination outside of that range is worse than useless. (worse because it can lead you to conclusions that are entirely wrong))

The "natural lambda" for a compressor is at the center of its natural range, where it is pareto optimal. This is the space-speed tradeoff point where the compressor is working its best, where all those implicit decisions are right.

Say you chose to do an LZ compressor with Huffman and to forbid overlapping matches with offset less than 16 (as in Kraken) - those decisions are wrong at some lambda. If you are lucky they are right decisions at the natural lambda.

Of course if you have developed a compressor in an ad-hoc way, it is inevitable that you have made many decisions wrong. The way this has been traditionally done in the past is for the developer to just try some ideas, often they don't really even measure the alternatives. (for example they might choose to pack their LZ offsets in a certain way, but don't really try other ways). If they do seriously try alternatives, they just look at the numbers for compression & speed and sort of eye ball them and make a judgement call. They do not actually have a well defined space-speed goal and a way to measure if they are improving that score.

The previously posted "correct" Weissman Score provides a systematic way of identifying the space-speed range you wish to target and thus identifying if your decisions are right in a space-speed sense.

Most ad-hoc compressors have illogical contradictory decisions in the sense that they have chosen to do something that makes them slower (to decode), but have failed to do something else which would be a better space-speed step. That is, say you have a compressor, and there are various decisions you face as an implementor. The implicit "natural lambda" for your compressor give you a desired slope for space-speed tradeoffs. Any decision you can make (either adding a new mode, or disabling a current mode; for example adding context mixing, or disabling arithmetic coding) should be measured against that lambda.

Over the past few years, we at RAD have been trying to go about compressor development in this new systematic way. People often ask us for tips on their compressor, hoping that we have some magical answers, like "oh you just do this and that makes it work perfectly". They're disappointed when we don't really have those answers. Often my advice leads nowhere; I suggest "try this and try that" and they don't really pan out and they think "this guy doesn't know anything". But that's what we do - we just try this and that, and usually they don't work, so we try something else. What we have is a way of working that is careful and methodical and thorough. For every decision, you legitimately try it both ways with well-optimized implementations. You measure that decision with a space-speed score that is targetted at the correct range for your compressor. You measure on a variety of test sets and look at randomized subsets of those test sets, not averages or totals. There's no magic, there's just doing the work to try make these decisions correctly.

Usually the way that compressor development works starts from a spark of inspiration.

That is, you don't set out by saying "I want to develop a compressor for the Weissman range of 10 - 100 mb/s" and then proceed to make decisions that optimize that score. Instead it starts with some grain of an idea, like "what if I send matches with LZSA and code literals with context mixed RANS". So you start going about that grain of idea and get something working. Thus all compressor development starts out ad hoc.

At some point (if you work at RAD), you need to transition to having a more defined target range for your compressor so you can beginning making well-justified decisions about its implementation. Here are a few ways that I have used to do that in the past.

One is to look at a space-speed "speedup" curve, such as the "pareto charts" I linked to above. You can visually pick out the range where your compressor is performing well. If you are pareto optimal vs the competition, you can start from that range. Perhaps early in development you are not yet pareto optimal, but you can still see where you are close, you can spot the area where if you improved a bit you would be pareto optimal. I then take that range and expand it slightly to account for the fact that the compressor is likely to be used slightly outside of its natural range. eg. if it was optimal from 10 - 100 mb/s , I might expand that to 5 - 200 mb/.

Once you have a Weissman range you can now measure your compressor's "correct" Weissman score. You then construct your compressor to make decisions using a Lagrange parameter, such as :

J = size + lambda * time

for each decision, the compressor tries to minimize J. But you don't know what lambda to use. One way to find it is to numerically optimize for the lambda that maximize the Weissman score over your desired performance interval.

When I first developed Kraken, Mermaid & Selkie, I found their lambdas by using the method of transition points.

Kraken, Mermaid & Selkie are all pareto optimal over large ranges. At some point as you dial lambda on Kraken to favor more speed, you should instead switch to Mermaid. As you dial lambda further towards speed in Mermaid, you should switch to Selkie. We can find what this lambda is where the transition point occurs.

It's simply where the J for each compressor is equal. That is, an outer meta-compressor (like Oodle Hydra ) which could choose between them would have an even choice at this point.

how to set the K/M/S lambdas

    measure space & speed of each

    find the lambda point at which J(Kraken) = J(Mermaid)
        that's the switchover of compressor choice

    call that lambda_KM

    size_K + lambda_KM * time_K = size_M + lambda_KM * time_M
    lambda_KM = (size_M - size_K) / (time_K - time_M)

    same for lambda_MS between Mermaid and Selkie

    choose Mermaid to be at the geometric mean of the two transitions :

    lambda_M = sqrt( lambda_KM * lambda_MS )

    then set lambda_K and lambda_S such that the transitions are at the geometric mean
        between the points, eg :

    lambda_KM = sqrt( lambda_K * lambda_M )
    lambda_K = lambda_KM^2 / lambda_M = lambda_KM * sqrt(lambda_KM / lambda_MS)
    lambda_S = lambda_MS^2 / lambda_M = lambda_MS * sqrt(lambda_MS / lambda_KM)

Of course always beware that these depend on a size & time measurement on a specific file on a specific machine, so gather a bunch on different samples and take the median, discard outliers, etc.

(the geometric mean seems to work out well because lambdas seem to like to go in a logarithmic way. This is because the time that we spend to save a byte seems to increase in a power scale way. That is, saving each additional byte in a compressor is 2X harder than the last, so compressors that decrease size more by some linear amount will have times that increase by an exponential amount. Put another way, if J was instead defined to be J = size + lambda * log(time) , with logarithmic time, then lambda would be linear and arithmetic mean would be approprite here.)

It's easy to get these things wrong, so it's important to do many and frequent sanity checks.

An important one is that : more options should always make the compressor better. That is, giving the encoder more choices of encoding, if it makes a space-speed decision with the right lambda, if the rate measurement is correct, if the speed estimation is correct, if there are no unaccounted for side effects - having that choice should lead to a better space-speed performance. Any time you enable a new option and performance gets worse, something is wrong.

There are many ways this can go wrong. Non-local side effects of decisions (such as parse-statistics feedback) is a confounding one. Another common problem is if the speed estimate doesn't match the actual performance of the code, or if it doesn't match the particular machine you are testing on.

An important sanity check is to have macro-scale options. For example if you have other compressors to switch to and consider J against them - if they are being chosen in the range where you think you should be optimal, something is wrong. One "compressor" that should always be included in this is the "null" compressor (memcpy). If you're targetting high speed, you should always be space-speed testing against memcpy and if you can't beat that, either your compressor is not working well or your supposed speed target range is not right.

In the high compression range, you can sanity check by comparing to preprocessors. eg. rather than spend N more cycles on making your compressor slower, compare to other ways of trading off time for compression, such as perhaps word-dictionary preprocessing on text, delta filtering on wav, etc. If those are better options, then they should be done instead.

One of the things I'm getting at is that every compressor has a "natural lambda" , even ones that are not conscious of it. The fundamental way they function implies certain decisions about space-speed tradeoffs and where they are suitable for use.

You can deduce the natural lambda of a compressor by looking at the choices and alternatives in its design.

For example, you can take a compressor like ZStd. ZStd entropy codes literals with Huffman and other things (ML, LRL, offset log2) with TANS (FSE). You can look at switching those choices - switch the literals to coding with TANS. That will cost you some speed and gain some compression. There is a certain lambda where that decision is moot - either way gives equal J. If ZStd is logically designed, then its natural lambda must lie to the speed side of that decision. Similarly try switching coding ML to Huffman, again you will find a lambda where that decision is moot, and ZStd's natural lambda should lie on the slower side of that decision.

You can look at all the options in a codec and consider alternatives and find this whole set of lambda points where that decision makes sense. Some will constrain your natural lambda from above, some below, and you should lie somewhere in the middle there.

This is true and possible even for codecs that are not pareto. For however good they are, their construction implies a set of decisions that target an implicit natural lambda.

ADD 03-06-18 :

This isn't just about compressors. This is about any time you design an algorithm that solves a problem imperfectly (intentionally) because you want to trade off speed, or memory use, or whatever, for an approximate solution.

Each decision you make in algorithm design has a cost (solving it less perfectly), and a benefit (saving you cpu time, memory, whatever). Each decision has a slope of cost vs benefit. You should be taking the decisions with the best slope.

Most algorithm design is done without considering this properly and it creates algorithms that are fundamentally wrong. That is, they are built on contradictory decisions. That is, there is NO space-speed tradeoff point whether all the decisions are correct. Each decision might be correct in a different tradeoff point, but they are mutually exclusive.

I'll do another example in compression because it's what I know best.

LZ4 does no entropy coding. LZ4 does support overlapping match copies. These are contradictory design decisions (*).

That is, if you just wanted to maximize compression, you would do both. If you wanted to maximize decode speed, you would do neither. In the middle, you should first turn on the feature with the best slope. Entropy coding has more benefit per cost than overlap matches, so it should be turned on first. That is :

High compression domain : { Huffman coding + overlap matches }

Middle tradeoff domain : { Huffman coding + no overlap matches }

High speed domain : { no Huffman coding + no overlap matches }

Contradictory : { no Huffman coding + overlap matches }

There are logical options for tradeoffs in different domains, and there are choices that are correct nowhere.

(* = caveat: this is considering only the tradeoff of size vs. decode speed; if you consider other factors, such as encode time or code complexity then the design decision of LZ4 could make sense)

This is in no way unique to compression. It's extremely common when programmers heuristically approximate numerical algorithms. Perhaps on the high end you use doubles with careful summation, exact integration of steps, handle degeneracies. Then you wish to approximate it for speed, you can use floats, ignore degenerate cases, take single coarse steps, etc. Each of those decisions has a differeent speed vs error cost and should be evaluated. If you have 10 decisions about how to design your algorithm, the ones that provide the best slope should be taken before other ones.

The "natural lambda" in this more general case corresponds to the fundamental structure of your algorithm design. Say you're doing 3d geometry overlap testing using shaped primitives like OBB's and lozenges. That is a fundamental decision that sets your space-speed tradeoff domain. You have chosen not to do the simpler/faster thing (always use AABB's or spheres), and you have chosen not to do the more complex thing (use convex hulls or exact geometry overlap tests). Therefore you have a natural lambda, a fundamental range where your decision makes sense, which is between those two. If you then do your OBB/Lozenges very carefully using slow exact routines, that is fundamentally wrong - you should have gone to convex hulls. Or if you really approximate your OBB/Lozenges with gross approximations - that is also fundamentally wrong - you should have just use AABB's/spheres instead.


Oodle tuneability with space-speed tradeoff

Oodle's modern encoders take a parameter called the "space-speed tradeoff". (specifically OodleLZ_CompressOptions:: spaceSpeedTradeoffBytes).

"speed" here always refers to decode speed - this is about the encoder making choices about how it forms the compressed bit stream.

This parameter allows the encoders to make decisions that optimize for a space-speed goal which is of your choosing. You can make those decisions favor size more, or you can favor decode speed more.

If you like, a modern compressor is a bit a like a compiler. The compressed data is a kind of program in bytecode, and the decompressor is just an intepreter that runs that bytecode. An optimal parser is like an optimizing compiler; you're considering different programs that produce the same output, and trying to find the program that maximizes some metric. The "space-speed tradeoff" parameter is a bit like -Ox vs -Os, optimize for speed vs size in a compiler.

Oodle of course includes Hydra (the many headed beast) which can tune performance by selecting compressors based on their space-speed performance.

But even without Hydra the individual compressors are tuneable, none more so than Mermaid. Mermaid can stretch itself from Selkie-like (LZ4 domain) up to standard LZH compression (ZStd domain).

I thought I would show an example of how flexible Mermaid is. Here's Mermaid level 4 (Normal) with some different space-speed tradeoff parameters :

sstb = space speed tradeoff bytes

sstb 32 :  ooMermaid4  :  2.29:1 ,   33.6 enc mbps , 1607.2 dec mbps
sstb 64 :  ooMermaid4  :  2.28:1 ,   33.8 enc mbps , 1675.4 dec mbps
sstb 128:  ooMermaid4  :  2.23:1 ,   34.1 enc mbps , 2138.9 dec mbps
sstb 256:  ooMermaid4  :  2.19:1 ,   33.9 enc mbps , 2390.0 dec mbps
sstb 512:  ooMermaid4  :  2.05:1 ,   34.3 enc mbps , 2980.5 dec mbps
sstb 1024: ooMermaid4  :  1.89:1 ,   34.4 enc mbps , 3637.5 dec mbps

compare to : (*)

zstd9       :  2.18:1 ,   37.8 enc mbps ,  590.2 dec mbps
lz4hc       :  1.67:1 ,   29.8 enc mbps , 2592.0 dec mbps

(* MSVC build of ZStd/LZ4 , not a fair speed measurement (they're faster in GCC), just use as a general reference point)

Point being - not only can Mermaid span a large range of performance but it's *good* at both ends of that range, it's not getting terrible as it out of its comfort zone.

You may notice that as sstb goes below 128 you're losing a lot of decode speed and not gaining much size. The problem is you're trying to squeeze a lot of ratio out of a compressor that just doesn't target high ratio. As you get into that domain you need to switch to Kraken. That is, there comes a point where the space-speed benefit of squeezing the last drop out of Mermaid is harder than just making the jump to Kraken. And that's where Hydra comes in, it will do that for you at the right spot.

ADD : Put another way, in Oodle there are *two* speed-ratio tradeoff dials. Most people are just familiar with the compression "level" dial, as in Zip, where higher levels = slower to encode, but more compression ratio. In Oodle you have that, but also a dial for decode time :

CompressionLevel = trade off encode time for compression ratio

SpaceSpeedTradeoffBytes = trade off decode time for compression ratio

Perhaps I'll show some sample use cases :

Default initial setting :

CompressionLevel = Normal (4)
SpaceSpeedTradeoffBytes = 256

Reasonably fast encode & decode.  This is a balance between caring about encode time, decode time,
and compression ratio.  Tries to do a decent job of all 3.

To maximize compression ratio, when you don't care about encode time or decode time :

CompressionLevel = Optimal4 (8)
SpaceSpeedTradeoffBytes = 1

You want every possible byte of compression and you don't care how much time it costs you to encode or
decode.  In practice this is a bit silly, rather like the "placebo" mode in x264.  You're spending
potentially a lot of CPU time for very small gains.

A more reasonable very high compression setting :

CompressionLevel = Optimal3 (7)
SpaceSpeedTradeoffBytes = 16

This still says you strongly value ratio over encode time or decode time, but you don't want to chase
tiny gains in ratio that cost a huge amount of decode time.

If you care about decode time but not encode time :

CompressionLevel = Optimal4 (8)
SpaceSpeedTradeoffBytes = 256

Crank up the encode level to spend lots of time making the best possible compressed stream, but make
decisions in the encoder that balance decode time.


The SpaceSpeedTradeoffBytes is a number of bytes that Oodle must be able to save in order to accept a certain time increase in the decoder. In Kraken that unit of time is 25600 cycles on the artifical machine model that we use. (that's 8.53 microseconds at 3 GHz). So at the default value of 256, it must save 1 byte in compressed size to take an increased time of 100 cycles.

old rants