The Machine in the Middle

July 12, 2020

When you meet someone and you shake hands and make eye contact, there's a human connection. In that moment, in that space and time, there's only two people and no one else.

When you connect to someone on social media, there's Facebook or Twitter in between you.

There's an amount of trust you hand over to them to facilitate your connections. Maybe this is good, maybe this is bad. They serve as a shield to protect you, but they also have their own agenda. I won't speculate beyond that for now, but you do pay for these services with your data and these services know you intimately.

I'm prompted to write to you today, because of hyperlinks; those predominately blue things you click that bring you from one webpage to the next.

When the internet was first created, it was for the sharing of knowledge and information. Think about research papers that cite and reference other texts. When you've written research papers in the past, you had a bibliography with all these references; readers can follow these citations to their sources.

As someone reading a bibliography, you'd need to head to a library to check out the texts to validate the claims you've read. You're trusting the author was operating in good faith, but verifying their claims as you're forming your opinions. You're also depending on the library to have not altered the texts for any reasons.

Hyperlinks simplify this process by allowing content hosted online to be linked from one webpage to the next. You can fact check information without leaving your couch. The trade-off is that you need to trust the webpage you are on and the webpage you are going to.

One of the most common types of attacks you should guard against is Phishing.

Say you've got a PayPal account. To use it, you visit paypal.com and log in with your email and password.

Imagine you got an email with a subject line “PayPal Alert: Insufficient Balance, Overdraft $489.05”. You panic a little. I don't even have that much in savings. Frantically, you open the email. You see the PayPal logo and the body of the email continues:

Dear faithful PayPal customer, we have taken proactive action on your behalf for your recent purchase of $489.05. Your bank alerted us you had insufficient funds. To avoid an overdraft fee, and because your account is in such good standing with us, we extended you a line of credit in the full amount of your purchase. However, you must reimburse us within 24 hours or apply for financing by visiting paypa1.com.

You ride a roller coaster of emotions. “This wasn't me, I'm so screwed, I need to get this fixed, but I'm a little flattered about how highly PayPal thinks of me.” You quickly click the link in the email. You're brought to a login page that has the PayPal logo and an email and password form.

You quickly fill out the email and password, you're logged in! But you're also hacked. Anything beyond this point is to keep the ruse going longer so you don't realize what just happened. And maybe to milk you a little further.

Maybe they even show the $489.05 credit. Maybe there's a number you can call for customer support where they tell you they'll cancel the purchase because they trust you that it's fraudulent. However, they need to verify your identity. It's simple, just provide your Social Security Number.

Are you wondering how you got hacked? Let's compare those two links from above. Your account is with paypal.com, but the email links you to paypa1.com. Don't see the difference?

paypal.com paypa1.com

It's really easy to gloss over, but a lowercase L and the number 1 can look indistinguishable with the right font against an unsuspecting target. In this example, paypa1.com is owned by some hacker to Phish for your email and password. Statistically you use the same combination for more than just PayPal, maybe even for your email account itself. If that's true, that's ungood.

In reality, paypa1.com is owned by PayPal to protect you from exactly this attack, which is why I feel safe sharing this attack example to you. It's important to be on guard. Not all websites own the variations of domains that can easily trick someone into divulging their private information.

Let's go back to my social media example. This entire essay is actually a rant thinly veiled as an educational piece. I mean, I hope you're learning about how to be more secure, but I'm also a tad bit peeved at the state of the web. I want you to understand why.

When someone shares a link to say https://example.com on Twitter, that link gets re-written by Twitter to be, in this example, exactly: https://t.co/5Eifw2nBTo?amp=1. Twitter owns and controls the t.co domain. You can read about Twitter's stance on why they do this here.

To summarize, they do it so you can 1) have long hyperlinks and not use up all your characters, 2) collect data on a per link basis, 3) protect you from attacks, including Phishing.

Those things are generally pretty good. I can't argue much with them. I mean, it's really easy to overlook a lowercase L and the number 1 after all. Good or bad though, we're still trusting Twitter as a machine in the middle. We'd be remiss if we didn't follow through here on what the worst-case scenario could be, so let's play this out.

For our example, when a visitor clicks on the hyperlink text https://example.com in our tweet, their browser will be redirected to https://t.co/5Eifw2nBTo?amp=1 and we trust they'll be redirected to our website https://example.com. That might seem a little silly, but Twitter's benefits to this process are outlined above.

Since we're looking for the worst-case scenario here, we need to know everything that's going on with this t.co link. Namely, we need to ask the question, what is amp?

AMP stands for Accelerated Mobile Pages, which is a project by Google. This article is a pretty good write-up that covers the pros and the cons. The tl;dr is that Google is striving to make the web as fast as possible, especially on mobile phones that might have variable network conditions, like being super slow.

That sounds cool, so what's the catch? You just need to have your content hosted on Google servers. So we'll mirror our webpage's content on https://example.com/amp. No big deal, right? Well... now that your webpage content is hosted by Google's servers, you're now trusting them to not to manipulate that content for any reason.

If by using AMP, you're giving up this level of control, why would anyone want to use AMP then?

Google uses an algorithm they call PageRank to determine how to sort the results for any given Google search query. Tons and tons of factors go into this algorithm, but each factor is weighted differently, it's Google's secret sauce, no one but the machine behind the curtain truly knows how it all works. One of those factors is how fast your webpage loads.

Everyone wants to be #1 on Google. So by making your webpage load faster, you'll be able to improve your search engine rankings. AMP is a really easy and cost-effective way to do this, especially if you're a small business. Google wants you to know AMP is not a requirement to rank highly on Google, because that could be a serious conflict of interest.

All that said, having the fastest possible webpage definitely won't hurt your rankings. Wink, wink.

We've got enough information now for a worst-case example. I'd like you to draw your own conclusions. Maybe this isn't so bad and maybe I'm just paranoid.

My caveat though is this is my profession of the past decade and each of these steps are possible, if not trivial, to implement. The likely hood of Twitter or Google actually doing the things I will outline below is low.

However, I want you to take that into consideration with these two statements and how they relate to you. “I know you can do bad things and I trust you not to” and “I know you can do bad things and I probably couldn't tell even if you did.”

Let's imagine I'm in independent journalist. I'm currently working on an article titled, “How Big Tech is Doubleplus Ungood.” In this article, I discuss the power of companies such as Twitter and Google. I'm hosting the content myself at https://example.com with an AMP mirror at https://example.com/amp on Google's servers.

Hypothetically here, Twitter and Google are against the contents of my article because it makes them look bad. Let's imagine they're in cahoots together. Let's be honest, there's gotta be a room where it happens.

When anyone shares either the AMP or non-AMP version of my article on Twitter, any reader that clicks it will be routed through t.co. On the t.co website, a decision is made. Is there an AMP version of this article? If so, let's direct all traffic to the AMP article and therefore to Google's servers.

On March 1st, 2012, Google updated some of their legal commitments to their users and reneged on their “Don't be evil” promise. This is outlined in more detail in this Gizmodo Article. The big takeaway is this PR statement from Google:

Our new Privacy Policy makes clear that, if you're signed in, we may combine information you've provided from one service with information from other services. In short, we'll treat you as a single user across all our products, which will mean a simpler, more intuitive Google experience.

So on the AMP version of the article, Google decides to manipulate the content by dividing readers into two categories. People that know Big Tech is ungood and people that don't.

If people know Big Tech is ungood, nothing is really going to change their mind and any tampering will only give these readers more ammo in their fight against Big Tech.

For people that believe Big Tech is good, reading about how Big Tech is ungood might make them question Big Tech and join in the fight against Big Tech.

So Google will render two variations depending on the reader, the original version as the author intended and the redacted version with the title “How Big Tech is Doubleplus Good”. The reader doesn't know whether a switch has happened. They trust Google though.

Some of my technically minded readers might have a deeper understand of AMP and would posit, “But that's not actually how AMP works”. To my understanding, AMP really is just some static pages cached on Google's servers. So sharding the content conditionally is not something they're doing today.

However, URLs don't care about implementation details. The web isn't a permanent and fixed entity. Webpages can change at any time for any reason, all that's required is control. Owning your own domain name is one way to establish your own control.

Handing part of that domain over to Google for the likes of AMP subverts your ownership and control. Maybe that's good for you and good for your business if you're trying to be #1 on Google. It might even give you a leg up against a security minded competitor that won't make that compromise.

Maybe none of this will ever be an issue, but it's technically possible and that's what matters. Maybe Google will never take advantage of their control of AMP sites. Maybe Twitter will always direct users where they want to go with their t.co link shortening service. We just trust them not to.

And these are really only a couple small examples, because trust is a wide-spread challenge on the web. Every app we download and every website we visit, we relegate a certain amount of trust over our digital, and sometimes physical, lives.

As a closing example, I'd like to call out Facebook's Ad Targeting. This is the system that allows an advertiser to target specific demographics for their ads.

Choose your audience based on age, gender, education, job title ... cities, communities and countries ... consumer behaviors such as prior purchases and device usage ... Add interests and hobbies of the people you want your ad to reach—from organic food to action movies.

This technology was built for and is in use today for advertising. What if it was used to manipulate the news and articles we're reading? Maybe Mark Zuckerberg has matured, but saying the people that trust him are “dumb fucks” has stuck with me as an insight for how Big Tech views each and every one of us.

We make digital handshakes every day. We hand over trust in small and big ways every hour we're awake. In those moments as you glance down at your black mirror, please remember: there's a machine in the middle, between you and the rest of the world.