There are very good reasons to not believe the browser's HTTP user agent field, however most of the techniques look like browser detection code. This page reviews some passive methods that I'm familiar with.
Many browser simulators do not use the same header order as the genuine article. It is useful to store this.
Firefox goes: Host, User-Agent, Accept, Accept-Language, Accept-Encoding, Accept-Charset, Keep-Alive, Connection, Referer
Chrome goes: Host, Connection, Accept, User-Agent, Referer, Accept-Encoding, Accept-Language, Accept-Charset, Keep-Alive
Chrome used to go: Host, Connection, User-Agent, Accept, Referrer, ...
MSIE after version 6 goes: Accept, Referer, Accept-Language, User-Agent, Accept-Encoding, Host, Connection, Keep-Alive, Accept-Charset
MSIE version 6 goes: Accept, Referer, User-Agent, Host, Accept-Encoding, Accept-Language, Accept-Charset
MSIE version 6 sometimes goes: Accept, Connection, Host, Referer, User-Agent, Accept-Encoding, Accept-Language, Accept-Charset
Sometimes headers are missing. Some proxy servers drop/insert headers.
Linux goes SACKOK(4), TSTAMP(8), NOP(1), WINDOWSCALE(3). Amazon/EC2 very often has WS=7.
OSX ends in TSTAMP(8), SACKOK(4), EOL(0) padded by one byte
Windows always puts NOP(1) first
Some firewalls will remove SACKOK, TSTAMP and WINDOWSCALE, however they do not typically reorder the options list.
It is useful to collect this information since browser simulators rarely get this right.
It is useful to have a database of HTTP User Agent strings and their release dates.
Modern browsers automatically update, and version-pinning is uncommon:
Google Chrome is rarely more than 90 days old, and more than half of Google's users upgrade within 30 days.
Mozilla Firefox upgrades are slower, but follow a similar shape.
Recording information from browsers is often "cache busted" with a random number. If five such random numbers can be collected sequentially, old versions of the most common web browsers can be identified reliably.
Firefox up until very recently and Internet Explorer used the same simple LFSR: M=53,X=26 for FireFox and M=54,X=27 for old MSIE. Two sequentially-generted random numbers d[0,1] are needed; find half of the internal state that produces the first random number:
long p=0x5DEECE66DL,a=0xBL,m=(1L<<48)-1,n=(long)(d[0]*(1L<<M)), mg=((1L<<27)-1)<<(48-27),b=((long)(n>>27)<<(48-X))&m, g=((long)(n&((1L<<27)-1))<<(48-27))&m;
then we probe all of the potential internal-states to find the next random number:
for (long o=b;o<=(b+((1L<<(48-X))-1));o++) { long t=(o*p+a)&m;if((t&mg)!=g)continue; long r=(t*p+a)&m;t=r; long u=(r>>(48-X))<<27; r=(t*p+a)&m;t=r; long v=(r>>(48-27)); if(d[1]==((u+v)/((double)(1L<<M))))return 1; }
Google Chrome up until recently used MWC. This requires five sequentlly-generated random numbers(d[0..4]) which we convert to unsigned:
unsigned u[5];for(int i=0;i<5;++i){ u[i]=(unsigned)(d[i]*(double)(1L<<32)); if(d[i]<0)u[i]|=0x80000000;}
Then we brute force less than 5×216 tests:
for(int i=0;i<65536;++i){ unsigned r0=(u[0]>>16)|(((((u[1]>>16)&0xffff) -(18273*((u[0]>>16)&0xffff))&0xffff)<<16)); unsigned r1=(u[0]&0xffff);r1|=(i<<16); for(int j=1;j<5;++j){ r0=(18273*(r0&0xFFFF))+(r0>>16); r1=(36969*(r1&0xFFFF))+(r1>>16); if(u[j]!=((r0<<16)|(r1&0xFFFF))goto Z; }return 1; Z:0;}
Firefox, Safari and Chrome now use Xorshift128, which makes differentiation impossible.