Wednesday, October 2, 2013

Internet Plumbing: Mixed Redirect Chains

There's a lot of plumbing that underlies a website like unglue.it. Some of the more complicated plumbing involves the connections and links to other sites. Unglue.it has connection plumbing for Google, Goodreads, Twitter, Amazon, LibraryThing, Readmill, Internet Archive, Facebook, MailChimp and Gravitar;  we've worked on some more that have yet to see daylight.

The rest of this post is about plumbing surprises. If you're not interested in website plumbing, feel free to go watch a cat video.

I've written more than you want to read about redirection, a rather important bit of website plumbing. HTTP redirects enable things like link shortening (e.g. bit.ly), long term link maintenance (e.g. crossref.org and purl.org), and just-in-time linking (e.g. OpenURL). If you redirect to another redirector, you have what's known as a redirect chain.

You can easily imagine the kind of mischief that can go on with redirects, the redirect loop being the most obvious. The plumbing in your web software has to know how to avoid getting stuck in redirect loops or endless redirect chains, and for the most part it does.

Security issues can also arise with redirects, especially with mixed redirect chains. A mixed redirect chain is one that includes both secure (HTTPS) and non-secure (HTTP) links. Here's an example trace for a shortened ebook download link on the unglue.it website (it's the latest unglued ebook, Feeding the City, about the amazing human "plumbing" that delivers lunches to workers in Mumbai). You can try it yourself: https://bit.ly/19Ncaz7

The first thing your web browser does is it sets up a secure connection to bit.ly. While doing this it checks bit.ly's X.509 certificate with bit.ly's OCSP responder, Digicert. OCSP stands for "Online Certificate Status Protocol", and the result is that you can be reasonably sure that your connection is to bit.ly and that no one but maybe the NSA can snoop on your communication with bit.ly. In particular, no one can see what link you ask to be resolved, and no one but you can see bit.ly's answer.

ask bit.ly:
(verify bit.ly, at http://ocsp.digicert.com/ ) https://bit.ly/19Ncaz7 GET /19Ncaz7 HTTP/1.1 Host: bit.ly
 bit.ly's answer:
HTTP/1.1 301 Moved Location: http://unglue.it/download_ebook/986/
In this example, bit.ly is redirecting to a non-secure URL, making the redirect mixed. Anyone between you and the destination can see what you're asking for if you follow the redirect. If you're in a Starbucks using wifi, Starbucks could conceivably send you a book about coffee instead. So the secure rigamarole you went through with bit.ly seems a bit wasted. But at least no one can see your bit.ly cookie and find out all the shortened links you've followed.

ask unglue.it
http://unglue.it/download_ebook/986/ GET /download_ebook/986/ HTTP/1.1 Host: unglue.it
unglue.it's answer:
HTTP/1.1 302 FOUND Location: https://archive.org/download/Feeding_the_City/9781909254039_Feeding_the_City.epub
Unglue.it still wants your ebook download to be secure, so it sends you to a secure archive where the file can be found. ebooks are increasingly containing Javascript and you really don't want to give bad guys the opportunity to insert malicious scripts in your ebook, even if most of today's reading platforms won't execute the scripts.

Since it's a different website, your web software needs to verify archive.org with their OCSP responder, GoDaddy.

ask archive.org:
(verify archive.org, at http://ocsp.godaddy.com/ ) https://archive.org/download/Feeding_the_City/9781909254039_Feeding_the_City.epub GET /download/Feeding_the_City/9781909254039_Feeding_the_City.epub HTTP/1.1 Host: archive.org
archive.org's answer:
HTTP/1.1 302 Moved Temporarily Location: https://ia801008.us.archive.org/4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub
The Internet Archive operates jillions of servers, and to save it the trouble of rebuilding its index whenever they move a file, they use a redirector to get you to the server where your ebook is living today. It's yet another server, so you have to check its certificate, too:

ask ia801008.us.archive.org:
(verify ia801008.us.archive.org at http://ocsp.godaddy.com/ ) https://ia801008.us.archive.org/4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub GET /4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub HTTP/1.1 Host: ia801008.us.archive.org
ia801008.us.archive.org's answer
HTTP/1.1 200 OK
And so we get our ebook. Since it comes on a secure connection, we can be sure it's the one that Internet Archive meant to give us. Since there was an insecure link in the redirect chain, we can't also be sure that it's the one that bit.ly meant to send us to.

You can see that there are a lot of steps in this chain. At every step of the way, your web plumbing needs to decide whether it's ok to send things like cookies or referers along with the request. For example, it should never be sending cookies received from a secure site to the insecure version of the same site. If a mixed redirect chain delivers you a javascript, you shouldn't mark a web page as secure even if the web page is securely delivered and it uses only https links to retrieve the javascripts.

A great example of how to implement this plumbing is the Requests module for Python. (Also a great example of clear, readable source code and documentation!)

An example of a buggy implementation of this plumbing is the open-uri code in Ruby. From the source code:

# This test is intended to forbid a redirection from http://... to# file:///etc/passwd, file:///dev/zero, etc.  CVE-2011-1521# https to http redirect is also forbidden intentionally.# It avoids sending secure cookie or referer by non-secure HTTP protocol.# (RFC 2109 4.3.1, RFC 2965 3.3, RFC 2616 15.1.3)# However this is ad hoc.  It should be extensible/configurable.
At least this code errs on the side of security. If you use Ruby to try downloading something via a mixed redirect chain, open-uri will raise an exception labeled "redirection forbidden". Perhaps it would be more accurate to label this a "too dicey for Ruby" exception.

You might argue that mixed redirect chains should not be allowed. Or at least that https-to-http redirects should be forbidden. There are two main faults with this:

  1. When when a links span multiple site, there's no practical way to ensure that your links don't get mixed. Even if they're not mixed now, that could change in the future.
  2. If you forbid https to http redirects, you're preventing sites from migrating to a more secure stance. A secure bit.ly would be impossible.

I tripped over the Ruby issue when implementing a connection to a partner that has built its site with Rails. They couldn't download some of our ebooks. Working together, we figured out what was wrong and implemented a work-around.

That's what us plumbers do for kicks.

Notes:

  1. One thing you CAN'T do is redirect https to http if your certificate expires. To fix broken security, you need to fix the security.


Enhanced by Zemanta

0 comments:

Contribute a Comment

Note: Only a member of this blog may post a comment.