Hemorrhaging Competitive Information

With all the kafuffle over a number of search engines handing over users’ query data to the feds in the news recently, a certain little security company I know (read: work for) has been getting a fair bit of play in the news (both online and off). It’s nice to see that people are starting to wake up to the risk to their personal privacy in the digital age – as more information goes online, it’ll only get easier to assemble a fairly comprehensive picture of a person’s habits from scattered sources.

One comment from Andrew Krcik, VP of Marketing at PGP, on the whole Google-versus-the-feds story that I found particularly interesting:

Companies are at risk too. Think about what employees search for in the course of the day. That information can reveal a lot about what is going on at a particular company — whether it is preparing for a product launch, or researching a new demographic, or preparing for a lawsuit.

While Andrew’s comment was with respect to the omniscience of search engines into the everyday thinking of businesses as evidenced by their employees’ search queries, search engines aren’t the only ones with the opportunity to collect that information. Think about the path the average search query follows when someone executes a search:

  1. I enter a search term in my browser and click “Search!”
  2. My browser, via my operating system, creates a HTTP request
  3. My operating system passes the request through my network card to my Internet connection
  4. My ISP passes my request to the search engine, which returns a list of results
  5. I click a link in the list of results (repeating a lot of the steps above), and create another HTTP request for the URL I want to visit
  6. The web server at my ultimate destination registers my request and returns the page I requested

Look at all the trust links in that sequence of steps! Ask yourself: Do I trust my browser not to collect statistics on the sites I visit and report them back to whoever created the browser? Do I trust that my operating system, or even my network card, isn’t doing the same thing? What about my ISP – are they “monetizing” my traffic by selling details on my surfing habits? I mean, did you really read all the terms of service, warranty, and other legal bric-a-brac that accompanied your computer and its associated software and services?

I’m guessing you didn’t.

Putting aside Ken Thompson-style subversions of your own operating system for a moment, let’s assume you trusted your operating system, your hardware, your ISP, and even the search engine. There’s still one person who knows what you’re asking for: your final destination web site!

Every time you visit a web site, the web site records your IP address; in some cases, an IP address might be resolved to belong to a particular organization. Imagine you’re a company and you notice a lot of hits coming from IP addresses owned by a particular company – what are they up to? With enough analysis, you might be able to eek out an idea of why they’re visiting. If, for example, they seem to be primarily visiting one section of your web site, you might conclude they’re preparing to launch a competitive product.

Companies hemorrhage information in many other ways, and that’s where things get really interesting. Combining information sources can allow noise to be resolved into useful information. For example, many email clients add the source IP address as a header to outgoing email – a company that combined analysis of its email traffic with web site traffic analysis might be able to determine who specifically is reading their site. It’s not that hard to do – I, for example, am often able to tell which of my friends are visiting my web site because I know a lot of their home or work IP addresses from email they have sent me in the past. Tools could easily be incorporated into corporate email, phone, web, and other systems to build a more cohesive picture.

In some ways, I’m surprised companies aren’t turning to information-warfare techniques to either camouflage their online activities to eliminate these subliminal channels of information leakage or to turn it to their advantage (you could, for example, recognize a competitor when they surf your web site and choose feed them misinformation). Given the relative overhead of such an undertaking, it’s unlikely it will happen – nevertheless, it’s important to recognize that the search engines aren’t the only ones who might be watching you.