Late in the (UTC) day on 24 February 2008, Pakistan Telecom (AS 17557) began advertising a small part of YouTube's (AS 36561) assigned network. This story is almost as old as BGP. Old hands will recognize this as, fundamentally, the same problem as the infamous AS 7007 from 1997, a more recent ConEd mistake of early 2006 and even TTNet's Christmas Eve gift 2004.
Just before 18:48 UTC, Pakistan Telecom, in response to government order to block access to YouTube (see news item) started advertising a route for 208.65.153.0/24 to its provider, PCCW (AS 3491). For those unfamiliar with BGP, this is a more specific route than the ones used by YouTube (208.65.152.0/22), and therefore most routers would choose to send traffic to Pakistan Telecom for this slice of YouTube's network.
I became interested in this immediately as I was concerned that I wouldn't be able to spend my evening watching imbecilic videos of cats doing foolish things (even for a cat). Then, I started to examine our mountains of BGP data and quickly noticed that the correct AS path ("Will the real YouTube please stand up?") was getting restored to most of our peers.
The data points identified below are culled from over 250 peering sessions with 170 unique ASNs. While it is hard to describe exactly how widely this hijacked prefix was seen, we estimate that it was seen by a bit more than two-thirds of the Internet.
This table shows the timing of the event and how quickly the route propagated (this is actually a fairly normal propagation pattern). The ASNs seeing the prefix were mostly transit ASNs below, so this means that these routes were distributed broadly across the Internet. Almost all of the default free zone (DFZ) carried the hijacked route at least briefly.
18:47:00 | uninterrupted videos of exploding jello |
18:47:45 | first evidence of hijacked route propagating in Asia, AS path 3491 17557 |
18:48:00 | several big trans-Pacific providers carrying hijacked route (9 ASNs) |
18:48:30 | several DFZ providers now carrying the bad route (and 47 ASNs) |
18:49:00 | most of the DFZ now carrying the bad route (and 93 ASNs) |
18:49:30 | all providers who will carry the hijacked route have it (total 97 ASNs) |
20:07:25 | YouTube, AS 36561 advertises the /24 that has been hijacked to its providers |
20:07:30 | several DFZ providers stop carrying the erroneous route |
20:08:00 | many downstream providers also drop the bad route |
20:08:30 | and a total of 40 some-odd providers have stopped using the hijacked route |
20:18:43 | and now, two more specific /25 routes are first seen from 36561 |
20:19:37 | 25 more providers prefer the /25 routes from 36561 |
20:28:12 | peers of 36561 start seeing the routes that were advertised to transit at 20:07 |
20:50:59 | evidence of attempted prepending, AS path was 3491 17557 17557 |
20:59:39 | hijacked prefix is withdrawn by 3491, who disconnect 17557 |
21:00:00 | the world rejoices; Leeroy Jenkins online again. |
Since BGP relies on a transitive trust model, validation between customer and provider is important. In this case, PCCW (3491) did not validate Pakistan Telecom's (17557) advertisement for 208.65.153.0/24. By accepting this advertisement and readvertising to its peers and providers PCCW was propagating the wrong route. Those who saw this route from PCCW selected it since it was a more specific route. YouTube was advertising 208.65.152.0/22 before the event started and the /24 was a smaller (and more specific) advertisement. According to usual BGP route selection process, the /24 was then chosen, effectively completing the hijack.
Because of the fast detection and reaction of the YouTube staff and cooperation with other providers, service for their (sub-) prefix was interrupted for about an hour and forty minutes for some lucky customers and, at most, a bit more than two hours. The exact duration of the outage depends on your vantage point on the Internet.
When these sorts of events occur, there is renewed interest in a variety of solutions to this problem. BGP is fundamental to provider relationships and will not be going away anytime soon. Cryptographic extensions to BGP have been suggested (Pretty Good BGP, Secure Origin BGP and SBGP). These may be too taxing for router CPUs. Of course, after any sort of hijacking event (whether inadvertent or malicious) prefix and AS monitoring is suggested (e.g., the Internet Alert Registry, the Prefix Hijack Alert System, RIPE's MyASN and Renesys' Routing Intelligence).
Ultimately, though, the problem remains one of transitive trust. A provider can and should limit the advertisements it will accept from a customer. The mechanics can be arranged manually or can be configured using Routing Policy Specification Language (RPSL) to communicate the policy and drive configuration. In the case of Pakistan Telecom, they originate or transit fewer than 1000 prefixes.
So, it's heartwarming to know that two things are still true. It is still trivially possible to hijack prefixes (whether maliciously or inadvertently). I can go to sleep knowing that my neighbors are happily watching their LOLCATS.
Comments
Wait: the order PDF mentions a specific video. It has been deleted due to a "terms of use violation": what was it?
Posted by: lamib | February 25, 2008 09:22 AM
You said that lucky folks only noticed a 30 minute outage. However, in the timeline you posted there is a 1hr20min gap between action and the first reaction. (18.49 to 20.07)
Could you clarify what fixed the bad route problem for any affected parties after about 30 minutes, and when was that countermeasure taken?
Posted by: xan | February 25, 2008 12:24 PM
I'm hearing rumblings that the block of YouTube had more to do with videos showing how the elections were rigged, and less to do with the "blasphemous" videos. The latter was simply an easy excuse to block the former.
Posted by: Trogdor | February 25, 2008 12:44 PM
Really nice post, and better than slogging through the NANOG hijack thread.
Posted by: ebw | February 25, 2008 01:53 PM
Thanks, xan!
My explanation did indeed say thirty minutes, though it should have said "about an hour and thirty minutes". (It was actually about 1h42m, but I say an hour and forty minutes in the corrected text above.)
I appreciate the correction!
-Martin
Posted by: Martin A. Brown | February 25, 2008 02:41 PM
Cryptographic BGP extension can't help in this case. It only tell who announce this, not who can announce this.
The upstream provider (AS 3491) don't filter any route. Just knowing who help nothing
Posted by: Daniel | February 25, 2008 08:00 PM
So, in fact it was NOT Pakistan or the Pakistan Telecom Authority that blocked YouTube, but a technician at PCCW who did not verify the PTA's routing advertisement.
I'm no fan of political censorship, and think that trying to prevent people from seeing a cartoon is self-defeating and wrong.
But as a journalist, I am a fan of the truth, which is that PCCW caused the problem, which it can correct by implementing a manual verification procedure before complying with customer requests.
Is that right, Earl?
----------
Earl: If I light my neighbor's house on fire and burn it to the ground, do you place blame solely on the fire department for not seeing the smoke and putting out the fire in time? All providers need to be good net citizens, which including not injecting garbage into the routing tables and also guarding against it from others - when possible. Both parties are responsible, but the source of "the fire" bears the greater responsibility.
Posted by: Nathaniel Forbes | February 25, 2008 09:20 PM
And here is an lolcat just for for this occasion...
http://nicklevay.net/misc/bgpcat.jpg
Posted by: Rattle | February 26, 2008 12:42 AM
Here's a BGPlay link (using RIPE RIS data) that nicely shows the propagation dynamics for the /24.
http://www.ris.ripe.net/cgi-bin/bgplay.cgi?prefix=208.65.153.0/24&start=2008-02-24+18:46&end=2008-02-24+21:05
--
Simon.
Posted by: Simon Leinen | February 26, 2008 05:22 AM
Thank you for this detailed technical account. The mass media accounts have, as usual, been an unintelligible mishmash.
Posted by: Nick Barnes | February 26, 2008 08:45 AM
Nathaniel,
The point is that there were two technical errors. First, Pakistan Telecom was advertising a route they had only intended to blackhole. Second, PCCW didn't have prefix filters installed to limit the reach of this advertisement.
Also routing advertisements are usually subject to a series of checks--it's unfortunate that PCCW did not have prefix checks to prevent this entire situation.
-Martin
Posted by: Martin A. Brown | February 26, 2008 09:17 AM
Just a note: for those who want a lower-latency way to discusss this event, we started a few discussions over at Babbledog, Renesys's personalized social news project.
Babbledog supports live discussion without moderation or waiting for your posts to show up.Take a look at this this discussion or search for related related stories
Posted by: todd underwood | February 26, 2008 11:03 AM
Its open again, i can view youtube here in Islamabad
Posted by: hina | March 1, 2008 06:17 AM
Hmmm... I'm living in China now and here, in Beijing, I often collide with site blocking. To prevent that, I'm using http://strongvpn.com. It’s a VPN account with strong and reliable service. I haven't use a new proxy every day after its blocking.
Posted by: Lilu | March 6, 2008 06:34 PM