Golf Horse Notes

2019-05-16: I used to avoid releasing the source code to see how far I can go without other influences, but after Maxime Euzière's enough persuasion I've released the raw source codes available in GitLab. This page will continue to host explanations not obvious from the code.

pokemon

bulbasaur (3420B)

A simple but efficient 5 bit packer with the delta coding of the shared prefix. Additional 3 bits for rare specials.

s=`ENCODED`;w="";b=1;v=5;for(i=0;s[i];++i)for(c=s.charCodeAt(i),b=b<<(c>>7?10:7)|c&1023;b>>v;v-5?w+=" '-.2é♀♂"[v=5,d&7]:d?d<27?w+=String.fromCharCode(96+d):w=w.substr(console.log(w),d-27):v=3)d=b&31,b>>=v

ivysaur (3224B)

Adapted a Huffman decoder from practically.

k=[C="charCodeAt"];b=u=2e4;for(c of`CODES`)b+=(k[b]=c[C]())?1:b;w="";for(c of`ENCODED`)for(c=c[C]()^128;c>1;c>>=1)for(v=k[u=u*2+c%2];v|=0;u=2e4)d=v%43,v/=43,d>8?w+=" '-.2é♀♂"[d-9]||String.fromCharCode(80+d):w=w.substr(console.log(w),d-1)

venusaur (3178B)

Multiple incremental improvements over ivysaur.

"Special" characters are no longer encoded in the bitstream. Since there are only 12 of them (for 8 distinct characters) there is an implicit pointer to the next special character that gets incremented each time.
It was overlooked that different but equivalently optimal Huffman trees may result in different code size. Tried a bit harder to find the better tree.
New bit packer that is more efficient at the expense of code size (~30 bytes). This turned out to be a good bet.
More microscopic optimizations.

k=[b=u=2e4];for(c of`CODES`)b+=(k[b]=c[C="charCodeAt"](j=w=""))?1:b;for(c of`ENCODED`)for(c=c[C](),c=c>>7?c<<5|[13,4,28][c&3]:c^128;c>1;c>>=1)for(v=k[u=u*2+c%2];v|=0;u=2e4)d=v%36,v/=36,d>8?w+=d-9?String.fromCharCode(87+d):`'éé- .. ♀♂-2`[j++]:w=w.slice(console.log(w),d-1)

sigbovik

henry (637B)

A direct adaptation of bulbasaur to emit 4 bit nibbles instead.

(I don't know why I ended up with "henry".)

s=`ENCODED`;w="";b=1;for(i=0;s[i];++i)for(c=s.charCodeAt(i),b=b<<(c>>7?10:7)|c&1023;b>>4;b>>=4)w+=(b&15).toString(16),w.length%64||console.log(w.substr(-64))

bovine (592B)

Switched to BigInt—I suspected this to be available for a long time. Also no longer try to extract as many bits from two-byte UTF-8 sequences; it now always reads 7 bits per code point (this approach is described below). As this allowed a very efficient packing of nibbles, it became much easier to replace a sequence of nibbles to a newline instead of counting nibbles to insert a newline! This clever 7 bit packer has greatly improved both the code size and data size.

b=0n;for(c of`ENCODED`)b=b<<7n^BigInt(c.charCodeAt());console.log(b.toString(16).replace(/6a/g,`NEWLINE`))

boviner (585B)

No code changes, but takes advantage of the rule that reordering is permitted. The original code was optimal in the code point count but 10 bytes larger in the byte count; this submission reduced a gap to 2 bytes. I believe this can be further optimized, but it is hard to search efficiently.

oeis

a100000 (18039B)

Okay, this one is annoying. It should be noted that this file contains:

Some negative numbers, some of them have aperiodic signs, some of them are only delta-codable after removing signs
A lot of sequences with reasonably small absolute values (< 100)
A lot of sequences that exhibit at least quadratic growth (and I didn't look at JS Bignums)
Some periodicity

This solution does try to exploit them but got nowhere. A ton of bit packing and implicit assumptions (hard to debug) are made: varying bit length depending on the maximum element (there are 3 modes, the last being a "bignum" mode), delta coding for the first two modes, metadata for periodicity and sign coding... Hence the name.

s=`ENCODED`;i=0;b=1;for(r=v=>b>>v?(q=b&(1<<v)-1,b>>=v,q):(g=s.charCodeAt(i++),b=b<<(g>>7?10:7)|g&1023,r(v));l=r(7);console.log(a+''))for(m=r(6),v=2,a=[],e=0;v||e<l;v&&-~c>>v?v>9||r(1)?v=0:v+=v%5:c>999?a[e-1]+=(c+'').substr(1):(e++&&m&2?c+=a[e-2]:0,a.push(v&&m&1&&r(1)?-c:c)))c=v?r(v):a[e-(m>>2)]

a000040 (9066B)

Ditched the manual coding and switched to the rANS decoder and order 2 context model (plus delta coding to squeeze a little bit more). This alone was able to save 4KB, and next 5KB was saved via manual definition matching. 91 out of 176 sequences were encoded in about 1.7KB of code before the compression. They are of course compacted, but not necessarily shortest but rather consistent in style because repetition helps the compressor. For the obvious reason several shared primitives like σ_n are introduced.

n=r=-1;t=INITIALSTATE;M=2<<16;A=x=>Array(M<<7).fill(x);m=A(1);p=A(M/4);s=[B=BigInt];L=x=>s=[n=~console.log(x+'')];P=x=>x&&P(x>>1)+x%2;R=(f,x=s[n-1]+1||0)=>f(x)?x:R(f,x+1);S=(x,n=1,y=x)=>y&&!(x%y)*y**n+S(x,n,y-1);I=x=>S(x)==x+1;T=n=>n-(N=1.3*n**.51|0)*-~N/2;for(k=i=c=0;i<8121;v-=256,c=c%256<<8|v,k?v?o+=String.fromCharCode(v):L(s,eval(o.repeat(k),k=0)):v>>7?~n?s[n]=s[n]<<7n|B(v&127):(k=v&127,o=';s[++n]='):v-97?s[++n]=B(v-=(v>97)<<7):L(s.map(v=>w+=v,w=0n)))for(v=1;v<256;p[d]+=(b*M/2-p[d]<<14)/(m[d]+=2*(m[d]<126))>>13,v=v*2+b)for(q=p[d=c<<8|v]*2+1,b=t%M<q,t=(b?q:M-q)*(t>>17)+t%M-!b*q;t<M<<4;r=r?y&127:-1)y=~r?r:`ENCODED`[C='charCodeAt'](i++),t=t<<7|([,13,36,92,96][r=y>>7]||y)

wordlist.asc

logistically (125983B)

A 7 bit arithmetic decoder (rANS) and five different context models (order 0 through 2 contexts + sparse contexts (0,2) and (0,3)) combined with the logistic context mixing. A recursive bigram and delta coding was also used to exploit low-hanging fruits. Total memory consumption: ~7MB (assuming the optimal memory layout, there is no fancy object), total wall time < 10s.

This was a bit hard, taking much time, but I did it. I've also tried to exploit the columnar structure but given enough sparse contexts it was deemed redundant. Resulting decoder is also a bit beefy, weighing at ~700 bytes. But the CM already gave me much margin. There are some crazy codes to reduce operations and byte counts simultaneously too.

s=`ENCODED`;r=-1;t=INITIALSTATE;M=2<<16;D=[];for(i=1;i<128;i+=2)D.push(M/i|0);m=[...'12333'].map(i=>[Array(1<<6*i).fill(0),Array(1<<6*i).fill(M/4),[0]]);z=[];a="";e=d=>(x=`BIGRAMS`.charCodeAt(d-50))?e(x%64,e(x>>6)):d<24?a=a.substr(console.log(a),a.length-d):a+=String.fromCharCode(73+d);for(i=k=0;k<DECODEDLEN;e(z[k++]=v-64))for(v=1;v<64;m.map(([c,p,w],i)=>(y=p[j=u[i]<<6|v]+=((b<<16)-p[j])*D[c[j]+=c[j]<63]>>16,w[0]+=x[i]*(b-q/M))),v=v*2+b)for(u=[,u=z[k-1],z[k-2]<<6|u,z[k-3]<<6|u,z[k-4]<<6|u],q=0,x=m.map(([,p,[w]],i)=>(x=p[u[i]<<6|v]*2+1,x=Math.log(x/(M-x)),q-=w*x,3/2048*x)),q=M/(1+Math.exp(q))|1,b=t%M<q,t=(b?q:M-q)*(t>>17)+t%M-!b*q;t<1<<21;r=r?y&127:-1)y=~r?r:s.charCodeAt(i++),t=t<<7|([,13,36,92,96][r=y>>7]||y)

contextually (120282B)

Tried to make use of more context models (sparse contexts (1), (1,2) and (2,3)); adding a new one costs about 10–15 bytes, so it was a good tradeoff. It now uses 7-bit codes instead of 6-bit codes (for more bigrams), so the memory usage saw tenfold increase. Also tweaked the decompressor to save dozens of bytes.

r=-1;t=INITIALSTATE;M=2<<16;A=Array;m=[...'12233333'].map(i=>[A(i=1<<7*i).fill(M/4),[0],A(i).fill(1)]);z=[a=""];e=d=>(x=`BIGRAMS`[K](d-50))?e(x%128,e(x>>7)):d<24?a=a.slice(console.log(a),a.length-d):a+=String.fromCharCode(73+d);for(i=k=0;k<DECODEDLEN;e(z[k++]=v-128))for(v=1;v<128;m.map(([p,w,c],i)=>(y=p[j=u[i]<<7|v]+=(b*M/2-p[j]<<14)/(c[j]+=2*(c[j]<126))>>13,w[0]+=x[i]*(b-q/M))),v=v*2+b)for(B=z[k-3],C=z[k-4],u=[q=0,u=z[k-1],A=z[k-2],A<<7|u,B<<7|u,C<<7|u,B<<7|A,C<<7|B],x=m.map(([p,[w]],i)=>(x=p[u[i]<<7|v]*2+1,x=Math.log(x/(M-x)),q-=w*x,x/1024)),q=M/(1+Math.exp(q))|1,b=t%M<q,t=(b?q:M-q)*(t>>17)+t%M-!b*q;t<M<<4;r=r?y&127:-1)y=~r?r:`ENCODED`[K='charCodeAt'](i++),t=t<<7|([,13,36,92,96][r=y>>7]||y)

ten-hundred

relationship (2531B)

A 5/6 bit packer with the delta coding of the shared prefix and the additional bigram coding. Bigram lists are conveniently stored before the actual bitstream, so the bit decoding routine is shared (they are distinguished by the bit width, 5 being a bigram and 6 being a code).

s=`ENCODED`;w="";b=1;v=5;m=[];p=d=>d>3?w+=String.fromCharCode(93+d):w=w.substr(console.log(w),d);for(i=0;s[i];++i)for(c=s.charCodeAt(i),b=b<<(c>>7?10:7)|c&1023;b>>v;v=m[67]?6:5)d=b&63,b>>=v,v-5?p(d<30?d:p(m[d-=30])&&m[d+34]):m.push(d&31)

conversation (2449B)

A carefully designed Huffman decoder. The tree is stored in the compressed format (length-only and bigrams built-in) and rebuilt in ~60 bytes of code (described below). The delta coding is still used.

s=`ENCODED`;z=`CODES`;k={};b=u=9e3;for(i=0;z[i];c?k[b++]=c:b*=2)c=z.charCodeAt(i++);w="";for(i=0;s[i];)for(c=s.charCodeAt(i++)^128;c>1;c>>=1)for(v=k[u=u*2+c%2];v|=0;u=9e3)d=v%32,v/=32,d>5?w+=String.fromCharCode(91+d):w=w.substr(console.log(w),d-1)

practically (2422B)

Incremental improvement over conversation. Features a much smaller Huffman decoder (again described below).

k=[C="charCodeAt"];b=u=9e3;for(c of`CODES`)b+=(k[b]=c[C]())?1:b;w="";for(c of`ENCODED`)for(c=c[C]()^128;c>1;c>>=1)for(v=k[u=u*2+c%2];v|=0;u=9e3)d=v%32,v/=32,d>5?w+=String.fromCharCode(91+d):w=w.substr(console.log(w),d-1)

immediately (2399B)

Incremental improvement over practically, mostly shared with venusaur due to the use of the common Huffman decoder.

k=[b=u=9e3];for(c of`CODES`)b+=(k[b]=c[C="charCodeAt"](w=""))?1:b;for(c of`ENCODED`)for(c=c[C](),c=c>>7?c<<5|[13,4,28][c&3]:c^128;c>1;c>>=1)for(v=k[u=u*2+c%2];v|=0;u=9e3)d=v%32,v/=32,d>5?w+=String.fromCharCode(91+d):w=w.slice(console.log(w),d-1)

Recurring themes

Bit packing

Since the verification environment always expects a UTF-8-encoded source code, and since the code size is measured in bytes (not "code points"), it is hard to make use of all 8 bits in a byte; we generally aim for 7 therefore, and refer this unit to "byte" hereafter.

Thanks to ES6 template literals we can put many control characters that are invalid in normal literals, but still we need to escape the following characters: \r (will be mangled to \n), $, ` and \.

Here's where I took inspiration from base-122 encoding. In short, two-byte UTF-8 sequence has almost 11 bits to spare, so when the invalid byte appears you can merge it to the next byte and store as a single code point without any escape sequence. Since this takes only 2 + 7 = 9 bits, depending on the application you can actually put more bits when the invalid byte appears (!). This forms the basis of virtually all my submissions.

It should be noted that you should be careful to integrate this bit packer to a bit decoder, as it will alter the decoding order. There are multiple strategies:

If your code operate on bytes you need to buffer the next byte. This is costly, but it is probably impossible to avoid on rANS decoder.
If your code operate on bits you can reconstruct the bits in the way the first byte is decoded first. In venusaur, the invalid byte index xx followed by an 8-bit byte yyyyyyyy is encoded as 000001yy yyyyyyxx and expanded as yyyyyyyy XXXXXXXX because bits are read from the least-significant bit.
If your code operate on units of bits it is complicated. Stitching bits from possibly multiple bytes is already complex, and two-byte sequence will make this more difficult. There is a crazy approach used in the past, allowing the units can be spread to multiple interleaved bytes. This required a trial and error and was replaced later though. Refer to encode_bq_v1 for the details.
In bovine the code size is so constrained that I chose to use only lower 7 bits in two-byte sequence. But you still need to pad 4 non-zero bits to make sure that it is a two-byte sequence, and the decoder should remove those bits. Instead I overlaid and XORed those bits to the previous byte; the previous byte is adjusted to account for that (and we have 15 possible paddings, so the previous byte can always be made valid). Unless the first byte is invalid (can be avoided), this results in the optimal character count. For the optimal byte count, well, you need an exhaustive search.

Huffman decoder

It is a well-known fact that Huffman trees can be described as the lengths of each code, provided that the trees are skewed towards the ones (shorter code is always lexicographically placed before longer code). This is famously used in the DEFLATE and gzip format. Since practically, the tree is encoded as a zero-separated string of corresponding codes (chosen that they are never zeroes).

That was for the external representation. How about internal representation? Instead of chasing pointers, we can represent leaf nodes as an associative array mapping the bit sequence to the code. If not found, read one more bit and retry. This can be done with a string key, but it is annoying to reconstruct; you need a .toString(2) with padding, which is no avail in JavaScript.

Instead of string keys, we use integer keys directly representing the bit sequence. However how can you distinguish, say, 01 from 1? Therefore we make use of non-zero guard bits to put above actual bits. With guard bits you can easily distinguish GUARD01 from GUARD1. The guard bits should be longer than the longest code, but otherwise can be freely chosen (for example, ivysaur uses 2e4 because there is 16 bit code). This approach equally applies to the bit reader itself, where the guard bit can be as short as 1.

Combining the tree reconstruction algorithm and guard bits, we result in the following compact code:

// TREE RECONSTRUCTION
t={}; // tree mapping
b=GUARD; // next bits to put
for(c of`CODES`)
  b+=(t[b]=c.charCodeAt())?1  // associate b to the code, and increase the code
                          :b; // or double b, effectively putting 0 to the end

// DECODER
b=GUARD; // currently read bits
for(c=BYTE; // the topmost bit (not a part of data) should be set
    c>1;    // if only the guard bit is remaining, read one more byte
    c>>=1)  // read from the least significant to the most significant
  (v=t[b=b*2+c%2])&& // v is the code or undefined;
                     // we can distinguish undefined by having non-zero codes
    (b=GUARD,...)    // we reset read bits and use v

(In reality, this code is interleaved with bigram decoder. And I used a clever preprocessor to put the guard bit to the read byte without even checking if it is an one-byte or two-byte sequence—see adjust_bq_for_bitstream.)

rANS and context model

I won't explain what are those here. See Wikipedia and Matt Mahoney's excellent book. For the rANS, Fabian Giesen's simple implementation is fairly readable.

Since rANS and context model is inherently simple, there is not much to cut off. It should be noted that, the bit probability reported by the context model should be clamped between 0 and 1 exclusive (otherwise rANS will divide by zero). Therefore I have represented the probability as a fixed point number, with the lowest bit always set whenever possible. This is not as useful for the normal context model, but greatly useful for the logistic mixing. The same technique also applies to the counts (unlikely the probability, this is commonly used in the performant encoders).

There are many possible approaches to the modeling, and I'm yet to explore many of them (for example, I haven't even started with higher order context models because hashing the context is nontrivial). I have at least tried entropy reducing transforms (namely BWT and MTF) and found that they actually perform worse with the context mixing.

sayuri.tx0.org