Browser Verification

There are very good reasons to not believe the browser's HTTP user agent field, however most of the techniques look like browser detection code. This page reviews some passive methods that I'm familiar with.

HTTP Header Order

Many browser simulators do not use the same header order as the genuine article. It is useful to store this.

Firefox goes: Host, User-Agent, Accept, Accept-Language, Accept-Encoding, Accept-Charset, Keep-Alive, Connection, Referer

Chrome goes: Host, Connection, Accept, User-Agent, Referer, Accept-Encoding, Accept-Language, Accept-Charset, Keep-Alive

Chrome used to go: Host, Connection, User-Agent, Accept, Referrer, ...

MSIE after version 6 goes: Accept, Referer, Accept-Language, User-Agent, Accept-Encoding, Host, Connection, Keep-Alive, Accept-Charset

MSIE version 6 goes: Accept, Referer, User-Agent, Host, Accept-Encoding, Accept-Language, Accept-Charset

MSIE version 6 sometimes goes: Accept, Connection, Host, Referer, User-Agent, Accept-Encoding, Accept-Language, Accept-Charset

Sometimes headers are missing. Some proxy servers drop/insert headers.

IP options

Linux goes SACKOK(4), TSTAMP(8), NOP(1), WINDOWSCALE(3). Amazon/EC2 very often has WS=7.

OSX ends in TSTAMP(8), SACKOK(4), EOL(0) padded by one byte

Windows always puts NOP(1) first

Some firewalls will remove SACKOK, TSTAMP and WINDOWSCALE, however they do not typically reorder the options list.

It is useful to collect this information since browser simulators rarely get this right.

User-Agent:

It is useful to have a database of HTTP User Agent strings and their release dates.

Modern browsers automatically update, and version-pinning is uncommon:

Google Chrome is rarely more than 90 days old, and more than half of Google's users upgrade within 30 days.

Mozilla Firefox upgrades are slower, but follow a similar shape.

"?_="+Math.random()

Recording information from browsers is often "cache busted" with a random number. If five such random numbers can be collected sequentially, old versions of the most common web browsers can be identified reliably.

Firefox up until very recently and Internet Explorer used the same simple LFSR: M=53,X=26 for FireFox and M=54,X=27 for old MSIE. Two sequentially-generted random numbers d[0,1] are needed; find half of the internal state that produces the first random number:

long p=0x5DEECE66DL,a=0xBL,m=(1L<<48)-1,n=(long)(d[0]*(1L<<M)),
     mg=((1L<<27)-1)<<(48-27),b=((long)(n>>27)<<(48-X))&m,
      g=((long)(n&((1L<<27)-1))<<(48-27))&m;

then we probe all of the potential internal-states to find the next random number:

for (long o=b;o<=(b+((1L<<(48-X))-1));o++) {
 long t=(o*p+a)&m;if((t&mg)!=g)continue;
 long r=(t*p+a)&m;t=r;
 long u=(r>>(48-X))<<27;
 r=(t*p+a)&m;t=r; 
 long v=(r>>(48-27));
 if(d[1]==((u+v)/((double)(1L<<M))))return 1;
}

Google Chrome up until recently used MWC. This requires five sequentlly-generated random numbers(d[0..4]) which we convert to unsigned:

unsigned u[5];for(int i=0;i<5;++i){
 u[i]=(unsigned)(d[i]*(double)(1L<<32));
 if(d[i]<0)u[i]|=0x80000000;}

Then we brute force less than 5×2¹⁶ tests:

for(int i=0;i<65536;++i){
 unsigned r0=(u[0]>>16)|(((((u[1]>>16)&0xffff)
    -(18273*((u[0]>>16)&0xffff))&0xffff)<<16));
 unsigned r1=(u[0]&0xffff);r1|=(i<<16);
 for(int j=1;j<5;++j){
  r0=(18273*(r0&0xFFFF))+(r0>>16);
  r1=(36969*(r1&0xFFFF))+(r1>>16);
  if(u[j]!=((r0<<16)|(r1&0xFFFF))goto Z;
 }return 1;
Z:0;}

Firefox, Safari and Chrome now use Xorshift128, which makes differentiation impossible.