The Soul of a Portable Machine

18 December 2009

Instead of lugging a laptop around, why not carry just the bits? I don't mean keeping your files on a USB key, or even booting from an external drive. I mean storing the entire "soul": the applications, data, OS and RAM image as a virtual machine you carry around on a portable drive. You could plug it into any sufficiently powerful host and resume your work without interruption [0].

So, I tried it. My primary "computer" is a $25 USB flash drive that I plug into $2,500 dumb terminals.

Setting it up


Note to OSX users: Apple does not want people running their operating system under virtualization. An OSX install disc will not normally boot under VMWare or Parallels. This policy is stupid and overbearing but I don't expect Apple to change, any more than I expect the control-freak leader of a hippie commune to loosen his grip on the bag of peyote buttons [1].

I started off with a 4GB stick I had lying around and an Ubuntu Linux image running under VMWare. It was fine for browsing, code editing, etc, but was too space-constrained for real work. The total size of your virtual machine files will be the disk size + RAM size plus a few percent overhead. VMWare requires additional free space equal to your RAM size, presumably in order to write the contents of RAM to disk atomically during a "suspend". So a 2GB virtual machine image with 1GB of RAM allocated will require more than 4GB of space. This turned out to be a nice excuse to buy myself a 16GB stick.

Setting up corporate effluent pipes (email, calendaring, VPN) on the VM would have taken more time than I wanted to spend, so I run them directly on the hosts.

Make sure your desktop background, window themes, terminal colors, etc are different between your host and virtual machines so you don't get confused. Turn off the VM's screensaver.

Use a pre-allocated disk. I made a single-file disk image instead of multiple 2GB chunks, but I have no idea if it makes a difference to speed or error-resistance.

It's like a laptop, except without the laptop

Running my Linux VM directly from a USB drive is smooth, contrary to expectations. I thought I would have to sync my machine image to the host machine's hard drive to get decent performance. I now put my computer in my pocket, commute to the office, plug in, and start working. Over lunch I hibernate the VM and make a backup to the host machine. If I need to work at home I plug it into my personal computer.

Annoyances & Limitations

The VM is slower than running directly on the host machine, but not by much unless you are doing disk-intensive work like compiling MacPorts. Video performance feels sluggish when scrolling or moving windows.

A usable OSX image is too large for my USB drive but a cheap external hard drive works well enough. Next year 64GB USB and SD drives will be in the sweet spot.

My work often requires testing on Windows inside other virtual machines, so I can't use all available RAM for my primary VM. There are minor UI annoyances like the host menubar popping up on OSX. A bug with the virtual sound system in VMWare seems to trigger a long-standing bug in OSX/Flash: YouTube videos play for a second then halt.

Next steps

It would be nice to transform an existing OSX system into a virtual one. I believe you could back up your machine using CarbonCopyCloner or SuperDuper to a .dmg file, then use qemu-img to convert to a .vmdk or other virtual disk format. But unless you find a way to shrink the image you'd need a portable drive with a capacity larger than your current hard drive.

The next big leap would be to replace the USB key with a real portable computer, something like the SmartQ5. The idea is to have one virtual machine that takes full advantage of whatever hardware it happens to be running on. Running a desktop-class virtual machine on a tiny device like would be extremely tricky if not impossible. Even leaving aside the different CPU, it's not possible (as far as I know) to change the amount of RAM or number of CPUs assigned to a VM without rebooting it. But I believe the hacks are out there, just waiting to be discovered.

Notes

[0] See Tom Hughes-Croucher's state vs sync and also the IBM SoulPad project from 2005.

[1] There are many ways to run OSX inside a VM, most of which are against the EULA in jurisdictions where the EULA is enforceable. The easiest way is to install a licensed generic copy of OSX (not the special model-specific copy that comes with Macs) on VMWare or Parallels, and then add some "special touches".

XCode (and thus gcc/make, MacPorts and the rest of the toolchain) will not install on a virtual OSX machine with the "ServerVersion.plist" hack enabled, giving a "error while evaluating JavaScript for the package" error. Software updates also fail. The point of the hack is to misrepresent the type of OSX you are running. So make a pair of scripts to toggle it. Further, deponent sayeth not.


A Dismal Guide to DNS

24 November 2009

(Originally appeared on Yahoo's Developer Blog)

The Domain Name System (DNS) is part of the "dark matter" of the internet. It's hard to observe the DNS directly yet it exerts an obscure, pervasive influence without which everything would fly apart. Because it's so difficult to probe people tend to take it for granted, which I think is a mistake. DNS problems can hurt the speed and reliability of your applications without you even noticing. In this article we'll take a look at the behavior of the DNS and walk through some experiments you can run to gather valuable data about your users' network performance.

A Clever Shambles

Before two computers can talk to each other on the 'net, one of them has to know the numeric IP address of the other. Using the DNS is often compared to looking up a number in the phone book. But that can give the impression the information is in one place, close to hand.

Instead, imagine it's 1982. You live in Tucson and you want to call a hotel in Toronto. You don't have a Toronto phone book so you call your local library. They don't have one either. Life is boring in Tucson, so the librarian uses her New York phone book to call another library. The nice lady in New York looks up the hotel's number in her copy of the Toronto phone book, tells it to your local librarian, who then calls back to give it to you. Doing all this is a hassle, so everyone in the chain writes down the number just in case the question ever comes up again.

The DNS is even more complex because of the hierarchy of internet domains. Consider the host name foo.bar.example.net. To look it up your computer will have to look up every part of the name, in reverse order. That means resolving ".", then "net.", then "example.net.", "bar.example.net.", and finally "foo.bar.example.net."[0]. It's not just a matter of finding the Toronto book. It's looking up someone who knows someone who has the Canada book and from there who has the Ontario book, then the Toronto book, and so on.

If this sounds ridiculously complex and fragile, that's because it is. Writing down the answer to common queries, aka caching, is the only reason we're able to get away with it. In practice the root domain "." is known to everyone. During normal operation "net." should be cached all levels including at your local librarian, aka your ISP. Anything beyond that requires some lookups unless the domain is already very well-known.

How long does it take to look up a hostname?

A single DNS lookup may involve several recursive lookups to machines all over the world. Because of this hassle, information is cached for short periods of time at every level, including on your computer. So "the time it takes to do a DNS lookup" can vary wildly depending on the state of affairs in many different places, and the quality of the network connections between them.

On Mac OSX the dscacheutil command will tell you about your computer's latency and cache hit ratio:

$ dscacheutil -statistics
Overall Statistics:
    Average Call Time     - 0.118626
    Cache Hits            - 236152
    Cache Misses          - 231052
    Total External Calls  - 279350

Statistics by procedure:

             Procedure   Cache Hits   Cache Misses   External Calls
    ------------------   ----------   ------------   --------------
         gethostbyname       161252          39952             6749
         gethostbyaddr           60            151              211
         ...
Listing 0: Mac OSX DNS statistics

These numbers are interesting but fairly useless for our purposes. It combines cached and uncached lookups into one "average". Also, browsers often cache and even precache DNS information, bypassing whatever the operating system is doing. So we can't rely on what the machine tells us. We need to do some experimenting on our own.

First, I ran long series of tests against Yahoo hostnames from the office, my house, and other locations. For 100 seconds I ran as many DNS lookups as I could and timed them. Each lookup was for a wildcard hostname. A wildcard like *.dnstest.example.net means you can make up random new hostnames on the fly, eg x9zzy.dnstest.example.net, that will resolve to a real IP address. This ensures that each test will be a full end-to-end DNS lookup without any caching to skew the numbers [1].


Figure 0: Average DNS latency at various locations

This graph is useful mostly to illustrate that it's possible for users on "broadband" connections to have invisible performance problems related to DNS. But it doesn't tell you which users or how many.

How can we figure out the response time distribution (distribution, not average) for a wide range of users? How can we get a better idea of the role the DNS plays in the performance of web applications? Conditions on the internet change constantly. The tests would have to be large-scale and continuous to mean anything.

Let's scope things down a bit. We don't really care about how quickly users resolve any hostname. We care about how quickly our users resolve our hostnames. So maybe you can get the data you want by observing your users. Unfortunately DNS lookups happen mostly through computers we do not control. Worse, they happen over UDP, which doesn't expose performance data to the callee. The request and response packets are sent without any error correction or acknowledgement. So we can't just look at the usual logs we collect on our servers.

The librarian in New York will never know how long it took the librarian in Tucson to call you back. The hotel staffer in Toronto has no idea how you found their number. That is, unless you tell him. And that's what we'll do: run a special series of tests from the perspective of the caller, ie the users, and report back results.

A DNS Observatory


It's tricky but not impossible to gather some statistics on user DNS latency without running benchmark software on their computers. One way works like this:

  1. Set up a wildcard hostname, perferably one that does not share cookies with your main site. Give it a low TTL, say, 60 seconds, so you don't pollute downstream caches.
  2. Set up a webserver for the wildcard hostname that serves zero-byte files as fast as possible. Make sure that KeepAlive, Nagle, and any caching headers are turned off.
  3. In the footer of the pages in your main site, add a script similar to Listing 1. It performs two HTTP requests: /A.gif and /B.gif. The first image load, A, will require a full DNS lookup and an HTTP transaction. The second, B, should only involve an HTTP transaction.
  4. Subtract the time it takes to complete B from the time it takes to complete A, and you have a (very) rough idea of how long it took to perform just the DNS lookup.
  5. Send the DNS and HTTP statistics back to your server as part of another image request. You can extract the results later from your logs.
  6. Rinse and repeat over a large sample (>10,000) of page views. Millions if you can.

NB: You will get strange, even negative, numbers from this test. The deviation of individual data points can be greater than the phenomenon you are trying to measure. If you want to get accurate numbers for a specific user you'll need to run many tests over a period of time. But a single test per user works well enough in aggregate.

<script>
(function() {
  function dns_test() {
    var random = Math.floor(Math.random()*(2147483647)).toString(36);
    var host = 'http://'+random+".dnstest.example.net";
    var img1 = new Image();
    var img2 = new Image();
    var img3 = new Image();
    var ts = null;
    var stats = {};
    img1.onload = function() {
        stats['dns'] = (new Date()).getTime() - ts;
        ts = (new Date()).getTime();
        img2.src = host + "/B.gif";
    };
    img2.onload = function() {
        stats['http'] = (new Date()).getTime() - ts;
        stats.dns = stats.dns - stats.http; // the clever bit
        img3.src = host + '/dnstest.gif?dns='+stats.dns+'&http='+stats.http;
    };
    ts = (new Date()).getTime();
    img1.src = host + "/A.gif";
  }
  window.setTimeout(dns_test, 11337);
})();
</script>
Listing 1: A poor man's DNS observatory

Below is a graph of the distribution of uncached DNS lookup times from real users in the wild, collected by this script over one week. The sample was heavily skewed towards US broadband connections. The median was 146 milliseconds and the geometric mean was 163 milliseconds [2]. This is rather larger than the 20-120 milliseconds quoted in the Yahoo Performance Guidelines for a "typical" DNS lookup. Beware pithy numbers (even ours).

The distribution is even more interesting than the averages. Twenty percent of users in our sample took more than 500 milliseconds just to resolve one hostname. Granted, these lookups were uncached. Assuming a 50% cache hit rate, that's still one out of ten users in this dataset laboring under crappy DNS performance. As of this writing that's a market as large as Safari, Chrome and Opera combined.


Figure 1: A histogram of uncached DNS request times

The cause is unclear. It's possible that user network quality is just that bad. It could be physical distance. It could also be the DNS resolvers of ISPs at fault. It could be your DNS server. Or it could be something else. Or all of the above.

Remember that your mileage may vary. Not every combination of site and userbase will have a similar graph. Also remember that a lot of caching is going on at every level of the system. There's not a simple fixed cost to using alternate hosts for your images and scripts. The best strategy may well be to have one and only one "asset host" or CDN that does not share cookies with your main site.

If you run a commercial website, consider setting up with a dedicated DNS hosting provider that has presence on several continents. The DNS hosting service typically thrown in for free by domain registrars is not very good. For most sites, solid DNS hosting costs about $USD 50 per year. It's worth the effort. Heck, set up with two different services for failover.

Try this at home

For privacy reasons we can't release the raw data we collected. But if you have a website with a fair amount of traffic, I strongly encourage you to run these DNS measurements for yourself. You can learn a lot by drilling down into the data.

  • Play with graphing the distributions of different subnets (eg 18.* for MIT or 12.* for AT&T). You might be surprised at who is fast and who is slow.
  • In your webserver logs for /dnstest.gif there should be a User-Agent field as well. So you can look at correlations between DNS performance, browsers, and operating systems. For example, check out those little bumps at 1s and 3s in Figure 1. It turns out that the DNS resolver in Windows has aggressive timeouts. Those bumps are caused by Windows clients timing out then succeeding on a retry.
  • We're not just timing DNS latency, we're also timing how long it takes to perform a minimal TCP handshake + HTTP transaction. That gives you interesting information about user connection latencies, for free. But that's a whole 'nother subject.

This article is the second in a series and part of ongoing research on web app performance. If you have any suggestions or ideas to help improve the experiments, please leave a note in the comments. Next we hope to dig into more detail about user network performance data and how you can use it to improve your websites and applications.

Notes

[0] The dot "." at the end is not a typo. Though "com" and "net" are called "top-level" domains there is actually one more layer behind them called the root domain, designated by that trailing dot. The root domain is managed as a global public utility by dozens of internet service providers all over the world.

Fun fact: the entire country of Sweden dropped off the 'net in October 2009 because a network operator forgot to include that last dot in a configuration file.

[1] I'm fudging here a bit. It's possible that during this test, everything up to .dnstest.example.net will be cached at the user's ISP. This is by design, to reduce load on the root and top-level domain servers. But the lookup should always at least do a request to the ISP's resolver and a request in turn to example.net's authoritative DNS server.

[2] These kinds of datasets tend to be log-normal, with long thin tails trailing from a large central spike. The "average" value, or arithmetic mean, would be misleading in this case so we won't discuss it.

[∞] Bonus footnote! Here is the code to generate a table from your webserver logs:

# run a grep for "/dnstest.gif" and save to a file
bzcat /your/apache/logs/access_log.*.txt.bz2 | grep dnstest.gif > /tmp/dnstest.log

# perl magic to split the data into columns
echo "A,B,C,D,dns,http" > latency.csv
perl -lane 'if (/^([\d]+)\.([\d]+)\.([\d]+)\.([\d]+).+dns=(\d+)\&http=(\d+)/) { print "$1,$2,$3,$4,$5,$6" }' dnstest.log >> latency.csv
Here are the R commands to generate graphs and poke around at the data. If you've never tried R, it is a wonderful open-source statistics suite. The best introduction on how to use it is here. (PDF)
## R script for generating the histogram
x <- read.csv("latency.csv", header=TRUE)

# take only results greater than 0 and less than 4,000
y <- subset(x, (x$dns > 0 & x$dns < 4000))

# draw the histogram 
hist(y$dns, xlab="Milliseconds", main=NA, breaks=200, col="red", border="red", prob=FALSE)

# rug() adds "tassles" to the bottom of the graph to show data point density.
# rug() makes tassles. Get it? Yeah.
rug(sample(y$dns, 5000))


## bonus bonus: more interesting stats

# Show only requests from Brazil. this filter is not strictly true
# (ie, they have more subnets than 200/8) but it's true enough to play with.
brazil = subset(y, (y$A==200))
hist(brazil$dns, xlab="Milliseconds", main="Brazil (200.*)", breaks=200, col="green", border="green", prob=FALSE)
rug(sample(brazil$dns, 5000))

## other interesting subnets
mit = subset(y, (y$A==18))
att = subset(y, (y$A==12))

# the "average" is not very useful with log-normal datasets
#mean(y$dns)

# median and geometric mean are more informative
median(y$dns)
exp(mean(log(y$dns)))

# percentage of users over / under a specific point
table(y$dns > 250) / length(y$dns) 
table(y$dns > 500) / length(y$dns) 

Labels:



YCombinator's RFS #5: An Accidental Case Study

21 November 2009

Update: check out ddotdash.

I pricked my ears up at the latest Request For Startups from YCombinator. Last year I wrote the first 6,000 lines of my last startup on an eeePC 901, which is about half the size of a MacBook Air. It's cheap and solid-state so I treated it more like a largish book than a smallish computer. I toted it around to many more places, even parties. Losing that thread of worry about damage made it feel an extra half-kilo lighter. I sometimes wished it had a cell phone built in.

In the end the BabyBook was too cramped for long sessions. Both my code output and my hands suffered. The RFS's accompanying essay is correct that whatever platform hackers use to hack on eventually wins in the larger market. But it doesn't necessarily follow that because hackers have smartphones the next step is hacking directly on them. An equally plausible path is that our phones become portable homedirs that plug into any computer when we need to do real work.

Any human input surface smaller than a sheet of paper starts to bump against physical limitations that Moore's law can't fix. To do development on a mobile you have to be willing to give up both touch-typing and large screens. That's a big sacrifice, and it's hard to say what benefits I'd trade for them.

Any innovation that improves the speed of reading or writing code on a small device should also work on a large device, unless it takes advantage of features or use cases unique to the small one. Otherwise it's not a net advantage to mobile hacking. I giggle at the thought of using the accelerometer to indent, smacking my code into line. But the list of things an iPhone can do that a laptop can't is short while the inverse is very long.

Was the OQO early, or wrong?

Let's start fresh. Imagine you have a device the size of a thick passport or Moleskine. It is a phone plus a computer as powerful as a laptop from three years ago. You can hack on it directly in a pinch. It has enough storage for all of your working projects. You can plug it into any keyboard and screen (or computer) and start hacking.

You know what? That sounds pretty neat. And we'll probably get there soon. Perhaps someone will make a dongle that does this for iPhone or Android. Or it could be the accessory that saves the FreeRunner. But this is not very different from today, except for the reduced utility of your computer when you are not at a properly-equipped desk.

So let's go back to code input and output. We use keyboards because we have 100 years of experience with them as speed-of-thought input devices. We use ever-larger screens in order to view ever-larger amounts of code and data in one glance, partly because moving our eyes is faster than moving our hands. Random access matters a lot.

Multitouch gestures (pinch, zoom, scroll, etc) are interesting and so far underused. Using the "mouse" is less of a productivity hit on a phone because your fingers are already in position. Instead of representing code as long lists we spread out like so much wallpaper, maybe we can represent and navigate it like the directed multigraph it actually is. The Canon Cat and Archy might point the way here.

That leaves speed of input. Hinting and autocomplete might bring the mobile programmer back up to an acceptable wpm. Terser languages and WYSIWYG UI builders can help as well. But still I worry about fatigue. It's hard enough to work comfortably on the hardware we have. On the other hand, there's no reason we need to have a keyboard per se. Experienced telegraph operators could type up to 40 wpm with one finger using Morse code. Typing on virtual keyboards is slow because unlike keyboards or game controllers you have to carefully coordinate vision with motion. Instead, I wonder what one could do with two thumbs, an updated Morse, and a little practice.



YUI Tip: Using DataTable and TreeView together

17 October 2009
For this tip we will make a browser for web server logs. The TreeView will display file and folder paths, and the DataTable will display individual log lines. Clicking on a file or folder in the Tree will cause the DataTable to filter out all but that path.



Read the rest at Yahoo's UI blog.


A Dismal Guide to Bandwidth

07 October 2009
(Originally appeared on Yahoo's Developer Blog)
Web app developers spend most of our time not thinking about how data is actually transmitted through the bowels of the network stack. Abstractions at the application layer let us pretend that networks read and write whole messages as smooth streams of bytes. Generally this is a good thing. But knowing what's going underneath is crucial to performance tuning and application design. The character of our users' internet connections is changing and some of the rules of thumb we rely on may need to be revised.
In reality, the Internet is more like a giant cascading multiplayer game of pachinko. You pour some balls in, they bounce around, lights flash and —usually— they come out in the right order on the other side of the world.

What we talk about, when we talk about bandwidth

It's common to talk about network connections solely in terms of "bandwidth". Users are segmented into the high-bandwidth who get the best experience, and low-bandwidth users in the backwoods. We hope some day everyone will be high-bandwidth and we won't have to worry about it anymore.
That mental shorthand served when users had reasonably consistent wired connections and their computers ran one application at a time. But it's like talking only about the top speed of a car or the MHz of a computer. Latency and asymmetry matter at least as much as the notional bits-per-second and I argue that they are becoming even more important. The quality of the "last mile" of network between users and the backbone is in some ways getting worse as people ditch their copper wires for shared wifi and mobile towers, and clog their uplinks with video chat.
It's a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions. A better mental model of bandwidth should include:
  • packets-per-second
  • packet latency
  • upstream vs downstream

Packets, not bytes

The quantum of internet transmission is not the bit or the byte, it's the packet. Everything that happens on the 'net happens as discrete pachinko balls of regular sizes. A message of N bytes is chopped into ceil(N / 1460) packets [1] which are then sent willy-nilly. That means there is little to no difference between sending 1 byte or 1,000. It also means that sending 1,461 bytes is twice the work of sending 1,460: two packets have to be sent, received, reassembled, and acknowledged.
    Packet #1 Payload
    
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    .....................................................................
    ...........................................................

    Packet #2 Payload
    .
Listing 0: Byte 1,461 aka The Byte of Doom
Crossing the packet line in HTTP is very easy to do without knowing it. Suppose your application uses a third-party web analytics library which, like most analytics libraries, stores a big hunk of data about the user inside long-lived cookie tied to your domain. Suppose you also stuff a little bit of data into the cookie too. This cookie data is thereafter echoed back to your web server upon each request. The boilerplate HTTP headers (Accept, User-agent, etc) sent by every modern browser take up a few hundred more bytes. Add in the actual URL, Referer header, query parameters... and you're dead. There is also the little-known fact that browsers split certain POST requests into at least two packets regardless of the size of the message.
One packet, more or less, who cares? For one, none of your fancy caching and CDNs can help the client send data upstream. TCP slow-start means that the client will wait for acknowledgement of the first packets before sending the rest. And as we'll see below, that extra packet can make a large difference in the responsiveness of your app when it's compounded by latency and narrow upstream connections.

Packet Latency

Packet latency is the time it takes a packet to wind through the wires and hops between points A and B. It is roughly a function of the physical distance (at 2/3 of the speed of light) plus the time the packet spends queued up inside various network devices along the way. A typical packet sent on the 'net backbone between San Francisco and New York will take about 60 milliseconds. But the latency of a user's last-mile internet connection can vary enormously [2]. Maybe it's a hot day and their router is running slowly. The EDGE mobile network has a best-case latency of 150msec and a real-world average of 500msec. There is a semi-famous rant from 1996 complaining about 100msec latency from substandard telephone modems. If only.

Packet loss

Packet loss manifests as packet latency. The odds are decent that a couple packets that made up the copy of this article you are reading got lost along the way. Maybe they had a collision, maybe they stopped to have a beer and forgot. The sending end then has to notice that a packet has not been acknowledged and re-transmit.
Wireless home networks are becoming the norm and they are unfortunately very susceptible to interference from devices sitting on the 2.4GHz band, like microwaves and baby monitors. They are also notorious for cross-vendor incompatibilities. Another dirty secret is that consumer-grade wifi devices you'll find in cafés and small offices don't do traffic shaping. All it takes is one user watching a video to flood the uplink.

Upstream < Downstream

Internet providers lie. That "6 Megabit" cable internet connection is actually 6mbps down and 1mbps up. The bandwidth reserved for upstream transmission is often 20% or less of the total available. This was an almost defensible thing to do until users started file sharing, VOIPing, video chatting, etc en masse. Even though users still pull more information down than they send up, the asymmetry of their connections means that the upstream is a chokepoint that will probably get worse for a long time.

A Dismal Testing Harness

A laptop sitting on top of a microwave with a cup of tea inside. The microwave is labeled 'packet loss generator'. The laptop is labeled 'packet loss measurement device'.
Figure 0: It's popcorn for dinner tonight, my love. I'm doing science!
We need a way to simulate high latency, variable latency, limited packet rate, and packet loss. In the olden days a good way to test the performance of a system through a bad connection was to configure the switch port to run at half-duplex. Sometimes we even did such testing on purpose. :) Tor is pretty good for simulating a crappy connection but it only works for publicly-accessible sites. Microwave ovens consistently cause packet loss (my parents' old monster kills wifi at 20 paces) but it's a waste of electricity.
The ipfw on Mac and FreeBSD comes in handy for local testing. The command below will approximate an iPhone on the EDGE network with a 350kbit/sec throttle, 5% packet loss rate and 500msecs latency. Use sudo ipfw flush to deactivate the rules when you are done.
    $ sudo ipfw pipe 1 config bw 350kbit/s plr 0.05 delay 500ms
    $ sudo ipfw add pipe 1 dst-port http
Here's another that will randomly drop half of all DNS requests. Have fun with that one.
    $ sudo ipfw pipe 2 config plr 0.5
    $ sudo ipfw add pipe 2 dst-port 53
To measure the effects of latency and packet loss I chose a highly-cached 130KB file from Yahoo's servers. I ran a script to download it as many times as possible in 5 minutes under various ipfw rules [3]. The "baseline" runs were the control with no ipfw restrictions or interference.
A graph showing number of request in 300 seconds on the X axis and packet latency on the Y axis. With increasing packet latency the number of requests decreases
Figure 1: The effect of packet latency on download speed
A graph showing number of request in 300 seconds on the X axis and packet loss on the Y axis. With increasing packet loss the number of requests decreases.
Figure 2: Effect of packet loss on download speed
Just 100 milliseconds of packet latency is enough to cause a smallish file to download in an average of 1500 milliseconds instead of 350 milliseconds. And that's not the worst part: the individual download times ranged from 1,000 to 3,000 milliseconds. Software that's consistently slow can be endured. Software that halts for no obvious reason is maddening.
A graph showing the response time in seconds from 0 to 3 on the Y axis with time on the X axis. The response times fluctuate between 0.25 and 3 seconds until a point in time labeled 'tea is done' when they become consistent between 0.25 and 0.75s.
Figure 3: Extreme volatility of response times during packet loss.

So, latency sucks. Now what?

Yahoo's web performance guidelines are still the most complete resource around, and backed up by real-world data. The key advice is to reduce the number of HTTP requests, reduce the amount of data sent, and to order requests in ways that use the observed behavior of browsers to best effect. However there is a simplification which buckets users into high/low/mobile categories. This doesn't necessarily address poor-quality bandwidth across all classes of user. The user's connection quality is often very bad and getting worse, which changes the calculus of what techniques to employ. In particular we should also take into account that:
  • Upstream packets are almost always expensive.
  • Any client can have high or low overall bandwidth.
  • High latency is not an error condition, it's a fact of life.
  • TCP connections and DNS lookups are expensive under high latency.
  • Variable latency is in some ways worse than low bandwidth.
Assuming that a large but unknown percentage of your users labor under adverse network conditions, here are some things you can do:
  • To keep your user's HTTP requests down to one packet, stay within a budget of about 800 bytes for cookies and URLs. Note that every byte of the URL counts twice: once for the URL and once for the Referer header on subsequent clicks. An interesting technique is to store app state that doesn't need to go to the server in fragment identifiers instead of query string parameters, e.g. /blah#foo=bar instead of /blah?foo=bar. Nothing after the # mark is sent to the server.
  • If your app sends largish amounts of data upstream (excluding images, which are already compressed), consider implementing client-side compression. It's possible to get 1.5:1 compression with a simple LZW+Base64 function; if you're willing to monkey with ActionScript you could probably do real gzip compression.

  • YSlow says you should flush() early and put Javascript at the bottom. The reasoning is sound: get the HTML <head> portion out as quickly as possible so the browser can start downloading any referenced stylesheets and images. On the other hand, JS is supposed to go on the bottom because script tags halt parallel downloads. The trouble comes when your page arrives in pieces over a long period of time: the HTML and CSS are mostly there, maybe some images, but the JS is lost in the ether. That means the application may look like it's ready to go but actually isn't — the click handlers and logic and ajax includes haven't arrived yet.
    Partly loaded images on Google docs view of this article
    Figure 4: docs is loading slowly... dare I click?
    Maybe in addition to the CSS/HTML/Javascript sandwich you could stuff a minimal version of the UI into the first 1-3KB, which gets replaced by the full version. Google Docs presents document contents as quickly as possible but disables the buttons until its sanity checks pass. Yahoo's home page does something similar.
    This won't do for heavier applications, or those that don't have a lot of passive text to distract the user with while frantic work happens offstage. Gmail compromises with a loading screen which times out after X seconds. On timeout it asks the user to choose whether to reload or use their lite version.

  • Have a plan for disaster: what should happen when one of your scripts or styles or data blobs never arrives? Worse, what if the user's cached copy is corrupted? How do you detect it? Do you retry or fail? A quick win might be to add a checksum/eval step to your javascript and stylesheets.
  • We also recommend that you should make as much CSS and Javascript as possible external and to parallelize HTTP requests. But is it wise to do more DNS lookups and open new TCP connections under very high latency? If each new connection takes a couple seconds to establish, it may be better to inline as much as possible.
  • The trick is how to decide that an arbitrary user is suffering high latency. For mobile users you can pretty much take high latency as a given [4]. Armed with per-IP statistics on client network latency from bullet #4 above, you can build a lookup table of high-latency subnets and handle requests from those subnets differently. For example if your servers are in Seattle it's a good bet that clients in the 200.0.0.0/8 subnet will be slow. 200.* is for Brasil but the point is that you don't need to know it's for Brasil or iPhone or whatever — you're just acting on observed behavior. Handling individual users from "fast" subnets who happen to have high latency is a taller order. It may be possible to get information from the socket layer about how long it took to establish the initial connection. I don't know the answer yet but there is promising research here and there.
  • A good technique that seems to go in and out of fashion is KeepAlive. Modern high-end load balancers will try to keep the TCP connection alive between themselves and the client, no matter what, while also honoring whatever KeepAlive behavior the webserver asks for. This saves expensive TCP connection setup and teardown without tying up expensive webserver processes (the reason why some people turn it off). There's no reason why you couldn't do the same with a software load balancer / proxy like Varnish.
This article is the first in a series and part of ongoing research on bandwidth and web app performance. It's still early in our research, but we chose to share what we've found early so you can join us on our journey of discovery. Next, we will dig deeper into some of the open questions we've posed, examine real-world performance in the face of high latency and packet loss, and suggest more techniques on how to make your apps work better in adverse conditions based on the data we collect.

Notes

[1] ceil(N / 1460) is the same algorithm you use to figure out how many trips it takes to carry your laundry down the stairs. (ceil is geekspeak for rounding up.) Say you have 50 pounds of clothes and the basket holds 13 pounds. 50 / 13 = 3 remainder 11, so you need to make 4 trips. The bigger the basket the fewer the trips. So why not use huge packets? On private networks you might see configurations for "Jumbo frames". But in the wild you have to consider the cost of packet loss, typical message sizes, old or incompatible routers, etc.
That specific number (aka Maximum Segment Size) comes from the maximum packet size (aka Maximum Transmission Unit) of 1,500 octets (aka bytes) set in RFC 1191 (aka Ethernet v2), minus the space reserved for the source and destination addresses, flags, etc. IPv6, which has been coming any day now since the Clinton administration, will probably converge on an MSS of 1,220 or 1,440 in the wild. Point being, we're stuck with tiny packets for the rest of our lifetimes.
[2] DNS can also cause latency. We tend to take hostname lookups for granted, but an ISP's DNS resolvers are often unloved. It once took me several years to convince BellSouth's customer service that one of their DNS resolvers was actually off the network. User DNS problems are doubly nasty because we as application developers can't control or even detect them.
[3] The script was single-threaded and used a new TCP connection for each request. A single restriction was used per run, ie X milliseconds latency or Y% packet loss. The wifi was a Linksys WRT54g at a distance of 5 meters, with standard firmware in 802.11g mode and WPA2 encryption. The uplink was a "6mbps" home cable connection about 50 miles and ten network hops away from the nearest Yahoo caching server, during off-peak hours.
[4] The Google mobile team recently put out an interesting fact: "On an iPhone 2.2 device, 200k of JavaScript held within a block comment adds 240ms during page load, whereas 200k of JavaScript that is parsed during page load added 2600 ms."
Image Credit: Pachinko by the_toe_stubber on Flickr.

Labels:



carlos@bueno.org

Most Popular

All Posts

My Projects

RSS