Can we do for the cloud what the darkweb does for the web? Can we provide a way for entities to consume cloud resources anonymously? BTW, totally kidding about the “brought to you by Suits” thing. Please don’t sue me (unless Harvey gets to represent me!)
Let’s begin by looking at the case of a VPN, and expand to the cloud in general later. VPN’s are great! Connect anonymously, circumvent streaming windows to watch that new episode of Suits no matter where you are, use the Internet as the Internet instead of a series of nationally isolated networks, etc.
However, they do have one drawback. What if the VPN is lying to you? What if they actually do keep logs? Let’s look at what a VPN might know about you.
- Your IP Address (every time you connect)
- The content or metadata (e.g. if SSL) of your traffic for all sessions
- Your payment details (potentially tied to your account)
- Metadata on your account (time of login/logoff, traffic patterns/destinations, length of subscription, date of renewal purchases, etc.)
While you might not care if everyone knows you secretly root for Louis (Team Litt!), the stakes might be higher for others, say free speech activists, investigative journalists, or human rights workers. Can we do better?
Of course we can!
With some clever engineering we can turn that frown upside down and reach the following situation. Even if your VPN is lying and actually logs you:
- They will not know your IP
- They will not be able to tie your payment details to a session, nor to a user ID
- They will only know information on your traffic related to a single session, not across sessions
Ideally we’ll find the same advantages no matter the nature of the resource being provided. E.g., even for provisioning of compute VM’s or storage, we want the anonymity gains to persist.
This solution will also remember to feed your cat and take out the garbage twice a week 😉
We term this API the Generic Resource Provider (GRP) API.
Why should we care?
Before I describe the solution, let’s take a moment to appreciate what is at stake here. What does it mean to access resources in a cloud style?
Everyone knows the cloud is the future (or the present, depending on who you talk to), and there is an overwhelming trend to move workloads into the cloud. Broadly speaking, a cloud is simply a collection of physical resources that have been virtualized, which users can than purchase on a pro-rata basis.
According to the National Institute of Science and Technology (NIST) definition, a cloud service is on-demand self served, broadly network available, benefits from resource pooling, rapidly elastic, and metered.
As the cloud continues to swallow the world (along with the coming Internet of Things revolution, in which connected devices will outnumber humans by at least an order of magnitude), what does it mean to have free, unfettered access to cloud resources?
I believe we are entering a time where computing resources increasingly form a foundational part of the social infrastructure, like money or knowledge (try building a house without either). The ability to consume these resources and use them for creative acts of production, free from snooping or censorship, will form the next frontier of societal innovation.
For example – what will you build with the plethora of connected devices logging and transforming your environment? And why should someone else watch that, watch you, and control it?
So how do we do it?
The challenge is to maintain those points the NIST laid out. Resource pooling (another term for economies of scale) and rapid elasticity can be captured by architecting our dark cloud on top of existing public clouds. In so doing we must make sure not to break the broad network availability, self-served, and metered aspects. If any of these five core characteristics breaks (or is unduly throttled), the viability of the dark cloud as a cloud diminishes.
We’ll accomplish this by introducing a middle layer between the User and Resource Provider. This middle layer we will call a Broker. If a User connects directly to the Resource Provider, she must place her trust in the Resource Provider and face the anonymity profile above. In our scenario trust is not eliminated, but it is almost entirely reduced to zero.
Even if the Broker and Resource Provider both collect logs, the User is still in a more optimal position.
Brokers gotta broke
Let’s walk through an example VPN scenario. Normally the User authenticates with the VPN by submitting a username / password directly. The VPN checks to see if these match a valid account and authorizes a connection.
In our case this scenario faces two significant changes. First, the User will authenticate through the Broker. Second, User, Broker, and VPN will all deploy over i2p.
Think of i2p like Tor, but
better different. i2p stands for “Invisible Internet Project.” You can think of it as a rewritten version of TCP/IP that runs on top of regular TCP/IP, but provides:
- End-to-end encryption
- Dissociation of router ID’s from IP’s (routable address != user’s identity)
- A TCP-like stream abstraction permitting vanilla TCP/IP applications to run on top of i2p with minimal rework (it also supports UDP)
For our purposes i2p hides IP addresses of all entities from each other while still guaranteeing traffic origin, e.g. the traffic comes from the entity it claims to come from (this is accomplished through signing). The problem of whether the entity that sent the traffic actually is the person/provider you think they are is not addressed by our design, and remains the same as in the status quo (trust a certificate authority or a self-signed cert).
We do note in passing that i2p might still provide an advantage over traditional SSL here, since the physical location of a provider can be hard to identify if they stay entirely within i2p. For example, let’s say The Democratic People’s Republic of Fakemerica passes the “DPRF Freedom Act,” granting their secret police the ability to compel Internet companies to handover their SSL keys.
If the physical location of the provider is difficult to determine, then forced compromise of the server will be much harder.
As an aside, I sometimes wonder if the logical conclusion of the whole spying-on-the-entire-planet thing is that governments themselves move their secrets into i2p. Anything airgapped will become non-airgapped relatively quickly, and for all the NSA’s “all offense all the time” strategy, governments are clearly incompetent at defense (see: Office of Personnel Management hack, complete with SF-86 forms up to cabinet level!). Perhaps a web of mutual distrust is what will prevail?
Brokers Gotta Broke Part 2 (For Real This Time)
So how does the Broker play into all this? The key weak point in the traditional User <-> VPN scenario is the user ID, which correlates all user activity. What if a User could dynamically change their ID whenever they connected to a VPN, but the VPN could still get paid and ensure adequate resource metering (e.g. “five connections per paying customer”)? And what if we could do all this while not exposing anyone’s IP address to anyone?
Here’s how it works. First, User, VPN, and Broker all exist on i2p. All connections and endpoints referenced from this point on are i2p connections and endpoints, not IP addresses. The VPN exposes a connection API to the Broker. That API asks for the following two things:
Username / Password
Client-generated token (e.g. some SHA2 digest)
We let the client generate the token to avoid potential deanonymization as a result of token generation practices on the part of the Resource Provider (e.g. poor parameter selection coupled with hashing on a string of interest, say the connection metadata, would be bad). This is fine since tokens only need to be unique within the namespace of a given username/password combination.
The VPN then pins this token to the user ID, with some Time to Live (TTL). Token pinning also lets the VPN enforce a max connections per user ID policy. If the user is required to submit the token on every request, the token can also be used to monitor for abuse.
While passing the token on every request sounds like it compromises anonymity, keep in mind that in the traditional VPN setup the user must already remain authenticated for the lifetime of their session with VPN. The difference here is that the User becomes more agnostic to the question of whether the VPN persists logs past the lifetime of the session.
The trick is that instead of the User connecting directly to the VPN, we’ll have the Broker do this on her behalf. The Broker in turn exposes a similar connection API to the User, who sends along her id/password and token. The Broker authenticates the id/password against records of paid accounts. The problem of anonymizing the payment stream remains the same in our scenario as in the status quo and is not something we address.
Since the User is requesting a VPN Provider from the Broker, the Broker initiates a connection request to the VPN on her behalf. This request happens when the Broker passes a username/password for an account it has purchased and maintains with the VPN. The VPN will authenticate, at which point the Broker passes the connection token back to the User. For maximum security the User can generate a new token on her own and replace the Broker-provided token with the new one after contacting the VPN (essentially do a token rollover so that the Broker does not know the token the User is using with the VPN).
This solution is also optimal for the Broker. If the Broker passed the actual authentication credentials for the VPN accounts it maintained nothing could stop the User from stealing those credentials. The token method also allows the Broker to enforce rate-limiting (only five tokens per user id) and some abuse tracking as well.
But what about the logs?
Since User, Broker, and VPN Provider are all on i2p, they do not need to know each other’s IP’s. The User only receives an i2p endpoint from the Broker, and all three only know each other by i2p router ID, as opposed to IP Address.
Even if Broker and VPN Provider both decide to log, the User is still better off in our scenario. Since connection sessions are disaggregated among multiple VPN Providers, metadata like logon/logoff timing and traffic destinations are harder to correlate to a single user: even if a third party knows which VPN Providers a user ID from a Broker has used, no single VPN contains all the logging relevant to a given User, making data collection much harder. Additionally, since IP addresses are not known and a User is free to rotate her i2p router ID, the logs are rendered much less useful.
But wait, there’s more! What if the Broker is compromised and forced to serve honeypot VPN’s, VPN’s that look and feel legitimate but are actually monitored by an interested third party who is recording all traffic?
The User can implement a simple whitelist that only permits connections to certified VPN’s. Certification can happen either through whitelisting of specific i2p endpoints (not recommended since these can roll), whitelisting of specific keys or certificates (self-signed and certificate authority as two possible models), or some combination. If the Broker serves a Resource that fails the whitelist, the User can send a “Resource Rejected Retry” message and await service of a new resource. Users can avoid publishing their whitelists with the Broker in order to avoid contributing metadata to the Broker’s logs.
And finally, security will improve markedly as an ecosystem develops. We already reaped a huge advantage by distributing the User’s sessions and connection metadata among multiple VPN Providers (courtesy of the Broker). Once there are multiple Brokers the User can distribute meta-session and meta-connection metadata over multiple Brokers. All this while still enabling metering of resources and abuse management through token-pinnig. Brokers will act as cloud resellers, offering slight markups on base public cloud prices of compute and storage resources (and VPN tunnels). Public clouds continue to meter the Brokers as they always have, with the addition of our Generic Resource Provider API. Brokers charge their Users as they wish, again utilizing the Generic Resource Provider. Win-win-win.
What’s the catch?
Ah my critical-thinking reader friend, why did you have to ask this question? There are, of course, limitations.
In general, this model seems to fit ad-hoc and personal usage better than enterprise usage. Moving through i2p will incur non-trivial (and non-uniform) latency, as well as slower throughput. For some users these costs will be acceptable, and i2p does allow the user to manually declare the tunnel length it uses for connections, providing some tuning options here.
Particular workloads may also be challenging to engineer. For one, persistent storage on the cloud seems challenging here, though Tahoe-LAFS over i2p might be one alternative. Second, code will require some rework to work over i2p, and there are performance gains to be had with re-engineering.
Nothing is a perfect, one-size fits all solution. And no Dark Cloud will spring forth fully formed. Nonetheless, we believe the Generic Resource Provider architecture outlined above is a robust starting point. Doubtless there are more weak points to discover and more workloads to consider, which, with some swag liberally borrowed from the indomitable Donna Paulsen, we happily leave for future work 😉
(No really, we know we haven’t thought of everything, this gif is just too awesome not to share! Hope you enjoyed the read!)