The Soul of a Portable Machine18 December 2009
Instead of lugging a laptop around, why not carry just the bits? I don't mean keeping your files on a USB key, or even booting from an external drive. I mean storing the entire "soul": the applications, data, OS and RAM image as a virtual machine you carry around on a portable drive. You could plug it into any sufficiently powerful host and resume your work without interruption [0].So, I tried it. My primary "computer" is a $25 USB flash drive that I plug into $2,500 dumb terminals. Setting it upI started off with a 4GB stick I had lying around and an Ubuntu Linux image running under VMWare. It was fine for browsing, code editing, etc, but was too space-constrained for real work. The total size of your virtual machine files will be the disk size + RAM size plus a few percent overhead. VMWare requires additional free space equal to your RAM size, presumably in order to write the contents of RAM to disk atomically during a "suspend". So a 2GB virtual machine image with 1GB of RAM allocated will require more than 4GB of space. This turned out to be a nice excuse to buy myself a 16GB stick. Setting up corporate effluent pipes (email, calendaring, VPN) on the VM would have taken more time than I wanted to spend, so I run them directly on the hosts. Make sure your desktop background, window themes, terminal colors, etc are different between your host and virtual machines so you don't get confused. Turn off the VM's screensaver. Use a pre-allocated disk. I made a single-file disk image instead of multiple 2GB chunks, but I have no idea if it makes a difference to speed or error-resistance. It's like a laptop, except without the laptopRunning my Linux VM directly from a USB drive is smooth, contrary to expectations. I thought I would have to sync my machine image to the host machine's hard drive to get decent performance. I now put my computer in my pocket, commute to the office, plug in, and start working. Over lunch I hibernate the VM and make a backup to the host machine. If I need to work at home I plug it into my personal computer.Annoyances & LimitationsThe VM is slower than running directly on the host machine, but not by much unless you are doing disk-intensive work like compiling MacPorts. Video performance feels sluggish when scrolling or moving windows.A usable OSX image is too large for my USB drive but a cheap external hard drive works well enough. Next year 64GB USB and SD drives will be in the sweet spot. My work often requires testing on Windows inside other virtual machines, so I can't use all available RAM for my primary VM. There are minor UI annoyances like the host menubar popping up on OSX. A bug with the virtual sound system in VMWare seems to trigger a long-standing bug in OSX/Flash: YouTube videos play for a second then halt. Next stepsIt would be nice to transform an existing OSX system into a virtual one. I believe you could back up your machine using CarbonCopyCloner or SuperDuper to a .dmg file, then use qemu-img to convert to a .vmdk or other virtual disk format. But unless you find a way to shrink the image you'd need a portable drive with a capacity larger than your current hard drive.The next big leap would be to replace the USB key with a real portable computer, something like the SmartQ5. The idea is to have one virtual machine that takes full advantage of whatever hardware it happens to be running on. Running a desktop-class virtual machine on a tiny device like would be extremely tricky if not impossible. Even leaving aside the different CPU, it's not possible (as far as I know) to change the amount of RAM or number of CPUs assigned to a VM without rebooting it. But I believe the hacks are out there, just waiting to be discovered. Notes[0] See Tom Hughes-Croucher's state vs sync and also the IBM SoulPad project from 2005.[1] There are many ways to run OSX inside a VM, most of which are against the EULA in jurisdictions where the EULA is enforceable. The easiest way is to install a licensed generic copy of OSX (not the special model-specific copy that comes with Macs) on VMWare or Parallels, and then add some "special touches". XCode (and thus gcc/make, MacPorts and the rest of the toolchain) will not install on a virtual OSX machine with the "ServerVersion.plist" hack enabled, giving a "error while evaluating JavaScript for the package" error. Software updates also fail. The point of the hack is to misrepresent the type of OSX you are running. So make a pair of scripts to toggle it. Further, deponent sayeth not. A Dismal Guide to DNS24 November 2009
(Originally appeared on Yahoo's Developer Blog)
A Clever ShamblesBefore two computers can talk to each other on the 'net, one of them has to know the numeric IP address of the other. Using the DNS is often compared to looking up a number in the phone book. But that can give the impression the information is in one place, close to hand. Instead, imagine it's 1982. You live in Tucson and you want to call a hotel in Toronto. You don't have a Toronto phone book so you call your local library. They don't have one either. Life is boring in Tucson, so the librarian uses her New York phone book to call another library. The nice lady in New York looks up the hotel's number in her copy of the Toronto phone book, tells it to your local librarian, who then calls back to give it to you. Doing all this is a hassle, so everyone in the chain writes down the number just in case the question ever comes up again. The DNS is even more complex because of the hierarchy of internet domains. Consider the host name If this sounds ridiculously complex and fragile, that's because it is. Writing down the answer to common queries, aka caching, is the only reason we're able to get away with it. In practice the root domain "." is known to everyone. During normal operation "net." should be cached all levels including at your local librarian, aka your ISP. Anything beyond that requires some lookups unless the domain is already very well-known. How long does it take to look up a hostname?A single DNS lookup may involve several recursive lookups to machines all over the world. Because of this hassle, information is cached for short periods of time at every level, including on your computer. So "the time it takes to do a DNS lookup" can vary wildly depending on the state of affairs in many different places, and the quality of the network connections between them. On Mac OSX the
$ dscacheutil -statistics
Overall Statistics:
Average Call Time - 0.118626
Cache Hits - 236152
Cache Misses - 231052
Total External Calls - 279350
Statistics by procedure:
Procedure Cache Hits Cache Misses External Calls
------------------ ---------- ------------ --------------
gethostbyname 161252 39952 6749
gethostbyaddr 60 151 211
...
Listing 0: Mac OSX DNS statistics
These numbers are interesting but fairly useless for our purposes. It combines cached and uncached lookups into one "average". Also, browsers often cache and even precache DNS information, bypassing whatever the operating system is doing. So we can't rely on what the machine tells us. We need to do some experimenting on our own. First, I ran long series of tests against Yahoo hostnames from the office, my house, and other locations. For 100 seconds I ran as many DNS lookups as I could and timed them. Each lookup was for a wildcard hostname. A wildcard like
This graph is useful mostly to illustrate that it's possible for users on "broadband" connections to have invisible performance problems related to DNS. But it doesn't tell you which users or how many. How can we figure out the response time distribution (distribution, not average) for a wide range of users? How can we get a better idea of the role the DNS plays in the performance of web applications? Conditions on the internet change constantly. The tests would have to be large-scale and continuous to mean anything. Let's scope things down a bit. We don't really care about how quickly users resolve any hostname. We care about how quickly our users resolve our hostnames. So maybe you can get the data you want by observing your users. Unfortunately DNS lookups happen mostly through computers we do not control. Worse, they happen over UDP, which doesn't expose performance data to the callee. The request and response packets are sent without any error correction or acknowledgement. So we can't just look at the usual logs we collect on our servers. The librarian in New York will never know how long it took the librarian in Tucson to call you back. The hotel staffer in Toronto has no idea how you found their number. That is, unless you tell him. And that's what we'll do: run a special series of tests from the perspective of the caller, ie the users, and report back results. A DNS Observatory
It's tricky but not impossible to gather some statistics on user DNS latency without running benchmark software on their computers. One way works like this:
NB: You will get strange, even negative, numbers from this test. The deviation of individual data points can be greater than the phenomenon you are trying to measure. If you want to get accurate numbers for a specific user you'll need to run many tests over a period of time. But a single test per user works well enough in aggregate. <script> (function() { function dns_test() { var random = Math.floor(Math.random()*(2147483647)).toString(36); var host = 'http://'+random+".dnstest.example.net"; var img1 = new Image(); var img2 = new Image(); var img3 = new Image(); var ts = null; var stats = {}; img1.onload = function() { stats['dns'] = (new Date()).getTime() - ts; ts = (new Date()).getTime(); img2.src = host + "/B.gif"; }; img2.onload = function() { stats['http'] = (new Date()).getTime() - ts; stats.dns = stats.dns - stats.http; // the clever bit img3.src = host + '/dnstest.gif?dns='+stats.dns+'&http='+stats.http; }; ts = (new Date()).getTime(); img1.src = host + "/A.gif"; } window.setTimeout(dns_test, 11337); })(); </script> Listing 1: A poor man's DNS observatory
Below is a graph of the distribution of uncached DNS lookup times from real users in the wild, collected by this script over one week. The sample was heavily skewed towards US broadband connections. The median was 146 milliseconds and the geometric mean was 163 milliseconds [2]. This is rather larger than the 20-120 milliseconds quoted in the Yahoo Performance Guidelines for a "typical" DNS lookup. Beware pithy numbers (even ours). The distribution is even more interesting than the averages. Twenty percent of users in our sample took more than 500 milliseconds just to resolve one hostname. Granted, these lookups were uncached. Assuming a 50% cache hit rate, that's still one out of ten users in this dataset laboring under crappy DNS performance. As of this writing that's a market as large as Safari, Chrome and Opera combined.
![]() Figure 1: A histogram of uncached DNS request times
The cause is unclear. It's possible that user network quality is just that bad. It could be physical distance. It could also be the DNS resolvers of ISPs at fault. It could be your DNS server. Or it could be something else. Or all of the above. Remember that your mileage may vary. Not every combination of site and userbase will have a similar graph. Also remember that a lot of caching is going on at every level of the system. There's not a simple fixed cost to using alternate hosts for your images and scripts. The best strategy may well be to have one and only one "asset host" or CDN that does not share cookies with your main site. If you run a commercial website, consider setting up with a dedicated DNS hosting provider that has presence on several continents. The DNS hosting service typically thrown in for free by domain registrars is not very good. For most sites, solid DNS hosting costs about $USD 50 per year. It's worth the effort. Heck, set up with two different services for failover. Try this at homeFor privacy reasons we can't release the raw data we collected. But if you have a website with a fair amount of traffic, I strongly encourage you to run these DNS measurements for yourself. You can learn a lot by drilling down into the data.
This article is the second in a series and part of ongoing research on web app performance. If you have any suggestions or ideas to help improve the experiments, please leave a note in the comments. Next we hope to dig into more detail about user network performance data and how you can use it to improve your websites and applications. Notes[0] The dot "." at the end is not a typo. Though "com" and "net" are called "top-level" domains there is actually one more layer behind them called the root domain, designated by that trailing dot. The root domain is managed as a global public utility by dozens of internet service providers all over the world. Fun fact: the entire country of Sweden dropped off the 'net in October 2009 because a network operator forgot to include that last dot in a configuration file. [1] I'm fudging here a bit. It's possible that during this test, everything up to .dnstest.example.net will be cached at the user's ISP. This is by design, to reduce load on the root and top-level domain servers. But the lookup should always at least do a request to the ISP's resolver and a request in turn to example.net's authoritative DNS server. [2] These kinds of datasets tend to be log-normal, with long thin tails trailing from a large central spike. The "average" value, or arithmetic mean, would be misleading in this case so we won't discuss it. [∞] Bonus footnote! Here is the code to generate a table from your webserver logs: # run a grep for "/dnstest.gif" and save to a file bzcat /your/apache/logs/access_log.*.txt.bz2 | grep dnstest.gif > /tmp/dnstest.log # perl magic to split the data into columns echo "A,B,C,D,dns,http" > latency.csv perl -lane 'if (/^([\d]+)\.([\d]+)\.([\d]+)\.([\d]+).+dns=(\d+)\&http=(\d+)/) { print "$1,$2,$3,$4,$5,$6" }' dnstest.log >> latency.csvHere are the R commands to generate graphs and poke around at the data. If you've never tried R, it is a wonderful open-source statistics suite. The best introduction on how to use it is here. (PDF) ## R script for generating the histogram x <- read.csv("latency.csv", header=TRUE) # take only results greater than 0 and less than 4,000 y <- subset(x, (x$dns > 0 & x$dns < 4000)) # draw the histogram hist(y$dns, xlab="Milliseconds", main=NA, breaks=200, col="red", border="red", prob=FALSE) # rug() adds "tassles" to the bottom of the graph to show data point density. # rug() makes tassles. Get it? Yeah. rug(sample(y$dns, 5000)) ## bonus bonus: more interesting stats # Show only requests from Brazil. this filter is not strictly true # (ie, they have more subnets than 200/8) but it's true enough to play with. brazil = subset(y, (y$A==200)) hist(brazil$dns, xlab="Milliseconds", main="Brazil (200.*)", breaks=200, col="green", border="green", prob=FALSE) rug(sample(brazil$dns, 5000)) ## other interesting subnets mit = subset(y, (y$A==18)) att = subset(y, (y$A==12)) # the "average" is not very useful with log-normal datasets #mean(y$dns) # median and geometric mean are more informative median(y$dns) exp(mean(log(y$dns))) # percentage of users over / under a specific point table(y$dns > 250) / length(y$dns) table(y$dns > 500) / length(y$dns) Labels: dismal guide YCombinator's RFS #5: An Accidental Case Study21 November 2009
Update: check out ddotdash.
In the end the BabyBook was too cramped for long sessions. Both my code output and my hands suffered. The RFS's accompanying essay is correct that whatever platform hackers use to hack on eventually wins in the larger market. But it doesn't necessarily follow that because hackers have smartphones the next step is hacking directly on them. An equally plausible path is that our phones become portable homedirs that plug into any computer when we need to do real work. Any human input surface smaller than a sheet of paper starts to bump against physical limitations that Moore's law can't fix. To do development on a mobile you have to be willing to give up both touch-typing and large screens. That's a big sacrifice, and it's hard to say what benefits I'd trade for them. Any innovation that improves the speed of reading or writing code on a small device should also work on a large device, unless it takes advantage of features or use cases unique to the small one. Otherwise it's not a net advantage to mobile hacking. I giggle at the thought of using the accelerometer to indent, smacking my code into line. But the list of things an iPhone can do that a laptop can't is short while the inverse is very long. Was the OQO early, or wrong?
You know what? That sounds pretty neat. And we'll probably get there soon. Perhaps someone will make a dongle that does this for iPhone or Android. Or it could be the accessory that saves the FreeRunner. But this is not very different from today, except for the reduced utility of your computer when you are not at a properly-equipped desk. So let's go back to code input and output. We use keyboards because we have 100 years of experience with them as speed-of-thought input devices. We use ever-larger screens in order to view ever-larger amounts of code and data in one glance, partly because moving our eyes is faster than moving our hands. Random access matters a lot. Multitouch gestures (pinch, zoom, scroll, etc) are interesting and so far underused. Using the "mouse" is less of a productivity hit on a phone because your fingers are already in position. Instead of representing code as long lists we spread out like so much wallpaper, maybe we can represent and navigate it like the directed multigraph it actually is. The Canon Cat and Archy might point the way here. That leaves speed of input. Hinting and autocomplete might bring the mobile programmer back up to an acceptable wpm. Terser languages and WYSIWYG UI builders can help as well. But still I worry about fatigue. It's hard enough to work comfortably on the hardware we have. On the other hand, there's no reason we need to have a keyboard per se. Experienced telegraph operators could type up to 40 wpm with one finger using Morse code. Typing on virtual keyboards is slow because unlike keyboards or game controllers you have to carefully coordinate vision with motion. Instead, I wonder what one could do with two thumbs, an updated Morse, and a little practice. YUI Tip: Using DataTable and TreeView together17 October 2009
For this tip we will make a browser for web server logs. The TreeView will display file and folder paths, and the DataTable will display individual log lines. Clicking on a file or folder in the Tree will cause the DataTable to filter out all but that path.![]() Read the rest at Yahoo's UI blog. A Dismal Guide to Bandwidth07 October 2009
(Originally appeared on Yahoo's Developer Blog) Web app developers spend most of our time not thinking about how data is actually transmitted through the bowels of the network stack. Abstractions at the application layer let us pretend that networks read and write whole messages as smooth streams of bytes. Generally this is a good thing. But knowing what's going underneath is crucial to performance tuning and application design. The character of our users' internet connections is changing and some of the rules of thumb we rely on may need to be revised.In reality, the Internet is more like a giant cascading multiplayer game of pachinko. You pour some balls in, they bounce around, lights flash and —usually— they come out in the right order on the other side of the world. What we talk about, when we talk about bandwidthIt's common to talk about network connections solely in terms of "bandwidth". Users are segmented into the high-bandwidth who get the best experience, and low-bandwidth users in the backwoods. We hope some day everyone will be high-bandwidth and we won't have to worry about it anymore.That mental shorthand served when users had reasonably consistent wired connections and their computers ran one application at a time. But it's like talking only about the top speed of a car or the MHz of a computer. Latency and asymmetry matter at least as much as the notional bits-per-second and I argue that they are becoming even more important. The quality of the "last mile" of network between users and the backbone is in some ways getting worse as people ditch their copper wires for shared wifi and mobile towers, and clog their uplinks with video chat. It's a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions. A better mental model of bandwidth should include:
Packets, not bytesThe quantum of internet transmission is not the bit or the byte, it's the packet. Everything that happens on the 'net happens as discrete pachinko balls of regular sizes. A message of N bytes is chopped intoceil(N / 1460) packets [1] which are then sent willy-nilly. That means there is little to no difference between sending 1 byte or 1,000. It also means that sending 1,461 bytes is twice the work of sending 1,460: two packets have to be sent, received, reassembled, and acknowledged. Packet #1 Payload
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
...........................................................
Packet #2 Payload
.
Listing 0: Byte 1,461 aka The Byte of DoomCrossing the packet line in HTTP is very easy to do without knowing it. Suppose your application uses a third-party web analytics library which, like most analytics libraries, stores a big hunk of data about the user inside long-lived cookie tied to your domain. Suppose you also stuff a little bit of data into the cookie too. This cookie data is thereafter echoed back to your web server upon each request. The boilerplate HTTP headers (Accept, User-agent, etc) sent by every modern browser take up a few hundred more bytes. Add in the actual URL, Referer header, query parameters... and you're dead. There is also the little-known fact that browsers split certain POST requests into at least two packets regardless of the size of the message. One packet, more or less, who cares? For one, none of your fancy caching and CDNs can help the client send data upstream. TCP slow-start means that the client will wait for acknowledgement of the first packets before sending the rest. And as we'll see below, that extra packet can make a large difference in the responsiveness of your app when it's compounded by latency and narrow upstream connections. Packet LatencyPacket latency is the time it takes a packet to wind through the wires and hops between points A and B. It is roughly a function of the physical distance (at 2/3 of the speed of light) plus the time the packet spends queued up inside various network devices along the way. A typical packet sent on the 'net backbone between San Francisco and New York will take about 60 milliseconds. But the latency of a user's last-mile internet connection can vary enormously [2]. Maybe it's a hot day and their router is running slowly. The EDGE mobile network has a best-case latency of 150msec and a real-world average of 500msec. There is a semi-famous rant from 1996 complaining about 100msec latency from substandard telephone modems. If only.Packet lossPacket loss manifests as packet latency. The odds are decent that a couple packets that made up the copy of this article you are reading got lost along the way. Maybe they had a collision, maybe they stopped to have a beer and forgot. The sending end then has to notice that a packet has not been acknowledged and re-transmit.Wireless home networks are becoming the norm and they are unfortunately very susceptible to interference from devices sitting on the 2.4GHz band, like microwaves and baby monitors. They are also notorious for cross-vendor incompatibilities. Another dirty secret is that consumer-grade wifi devices you'll find in cafés and small offices don't do traffic shaping. All it takes is one user watching a video to flood the uplink. Upstream < DownstreamInternet providers lie. That "6 Megabit" cable internet connection is actually 6mbps down and 1mbps up. The bandwidth reserved for upstream transmission is often 20% or less of the total available. This was an almost defensible thing to do until users started file sharing, VOIPing, video chatting, etc en masse. Even though users still pull more information down than they send up, the asymmetry of their connections means that the upstream is a chokepoint that will probably get worse for a long time.A Dismal Testing Harness![]() Figure 0: It's popcorn for dinner tonight, my love. I'm doing science! The ipfw on Mac and FreeBSD comes in handy for local testing. The command below will approximate an iPhone on the EDGE network with a 350kbit/sec throttle, 5% packet loss rate and 500msecs latency. Use sudo ipfw flush to deactivate the rules when you are done. $ sudo ipfw pipe 1 config bw 350kbit/s plr 0.05 delay 500ms
$ sudo ipfw add pipe 1 dst-port http
Here's another that will randomly drop half of all DNS requests. Have fun with that one. $ sudo ipfw pipe 2 config plr 0.5
$ sudo ipfw add pipe 2 dst-port 53
To measure the effects of latency and packet loss I chose a highly-cached 130KB file from Yahoo's servers. I ran a script to download it as many times as possible in 5 minutes under various ipfw rules [3]. The "baseline" runs were the control with no ipfw restrictions or interference.
![]() Figure 1: The effect of packet latency on download speed ![]() Figure 2: Effect of packet loss on download speed ![]() Figure 3: Extreme volatility of response times during packet loss. So, latency sucks. Now what?Yahoo's web performance guidelines are still the most complete resource around, and backed up by real-world data. The key advice is to reduce the number of HTTP requests, reduce the amount of data sent, and to order requests in ways that use the observed behavior of browsers to best effect. However there is a simplification which buckets users into high/low/mobile categories. This doesn't necessarily address poor-quality bandwidth across all classes of user. The user's connection quality is often very bad and getting worse, which changes the calculus of what techniques to employ. In particular we should also take into account that:
Notes
[1] ceil(N / 1460) is the same algorithm you use to figure out how many trips it takes to carry your laundry down the stairs. (ceil is geekspeak for rounding up.) Say you have 50 pounds of clothes and the basket holds 13 pounds. 50 / 13 = 3 remainder 11, so you need to make 4 trips. The bigger the basket the fewer the trips. So why not use huge packets? On private networks you might see configurations for "Jumbo frames". But in the wild you have to consider the cost of packet loss, typical message sizes, old or incompatible routers, etc.
That specific number (aka Maximum Segment Size) comes from the maximum packet size (aka Maximum Transmission Unit) of 1,500 octets (aka bytes) set in RFC 1191 (aka Ethernet v2), minus the space reserved for the source and destination addresses, flags, etc. IPv6, which has been coming any day now since the Clinton administration, will probably converge on an MSS of 1,220 or 1,440 in the wild. Point being, we're stuck with tiny packets for the rest of our lifetimes.
[2] DNS can also cause latency. We tend to take hostname lookups for granted, but an ISP's DNS resolvers are often unloved. It once took me several years to convince BellSouth's customer service that one of their DNS resolvers was actually off the network. User DNS problems are doubly nasty because we as application developers can't control or even detect them.
[3] The script was single-threaded and used a new TCP connection for each request. A single restriction was used per run, ie X milliseconds latency or Y% packet loss. The wifi was a Linksys WRT54g at a distance of 5 meters, with standard firmware in 802.11g mode and WPA2 encryption. The uplink was a "6mbps" home cable connection about 50 miles and ten network hops away from the nearest Yahoo caching server, during off-peak hours.
[4] The Google mobile team recently put out an interesting fact: "On an iPhone 2.2 device, 200k of JavaScript held within a block comment adds 240ms during page load, whereas 200k of JavaScript that is parsed during page load added 2600 ms."
Image Credit: Pachinko by the_toe_stubber on Flickr. Labels: dismal guide |
carlos@bueno.org
Most Popular
My Projects
|