There is wide interest in solving private data in AT - within the team, among users, and in the dev community. I figure it might help to share thoughts as this work progresses.
As proposals are shared, our goals should be to narrow down the knowns and unknowns, build clear requirements, and understand the tradeoffs before something gets shipped.
Background
AT is designed for large scale open applications. It accomplishes this by separating the network into two primary roles:
The PDS, which is a personal server for users – hosting their data and their account, and
The Application Servers – which aggregate data from all of the users’ servers to form applications.
Applications sign into users’ PDSes to publish records. The records then replicate out to listening apps so they can react to the changes. The flow looks roughly like this:
This is a multi-party transaction. When the user logs in via OAuth, the application is handed the URL of the PDS and a token granting access. Writes are sent out to the PDS via HTTP, and return a 200 to confirm a successful commit. Listening applications are then notified over sockets of the update.
Applications can subscribe directly to users’ PDSes to receive updates, if they want. In practice, because user data is signed, the network uses “relays” to rebroadcast a firehose of updates from a wide set of users.
The private data problem
Private data is a key requirement of AT, and it is not yet supported. The discussion is now: how do we introduce flows for personal-private data (e.g. preferences, bookmarks, drafts) and shared-private data (e.g. private posts)?
The model described above is pull-based and broadcast-oriented. It takes advantage of signed public data to widely store-and-forward user records in aggregations. This is what enables AT to operate at very high scales; broadcast is cheap. However, it has no facilities for selective sharing of data.
The PDS layer – again, the personal server of the user – is also very cheap to operate by design. Because it contains the users’ possessions (signing keys and primary data) it’s important to ensure self-hosting is affordable. We do not want to complicate PDS operation if it’s not necessary.
How smart can the PDS be? AT is oriented toward social applications, but it is generic in its design. Since a PDS may host a wide variety of application data, it’s not tenable to expect moderation or administration of application-specific behavior except in very simple forms (e.g. automated scanning). This is similar to an operating system’s relationship to the content of applications; the OS can bake in specific knowledge of certain file formats or application behaviors, but most behavior is defined within the applications.
It’s also important to consider how applications will interact with private data. Once granted access, they’re going to need to preserve the replication model that AT uses for private data so that the information can be integrated into its backend.
This sets up our initial requirements for private data in AT: we want to handle personal-private and shared-private data; We do not want to increase the operational or resource costs of the PDS; we don’t want to sacrifice generality; and we need applications to sync the private data.
Solving personal-private storage
The “personal-private” storage is unshared private data. It’s useful for state like preferences, bookmarks, and drafts. It could also store documents, notes, pictures, TODOs, and any other kind of data a user might want to keep in their personal server.
I personally believe private storage should mirror the public storage system. Some indicator in the URI should designate that it is private, and APIs to read & write should clearly distinguish between public and private storage, but otherwise private records & collections should behave the same way as their public counterparts.
With the introduction of OAuth and auth scopes, applications have a clear mechanism to gain access-grants for private data. Once this has occurred, an application can effect reads & writes on the granted records, and open a replication stream directly with the PDS to listen to updates to all granted private records. This can be multiplexed to cover all users with grants.
As granted private records are synced, applications can use similar business logic to users’ public data. The most significant difference is that relays are not available to aggregate the firehose of activity; the application needs to establish replication streams with each of its users’ PDSes.
It’s possible that I’m missing some other requirements for personal-private storage, but the task seems fairly straightforward. The access control model is not complicated, no signing model is needed, and the scaling properties are obvious.
Solving shared-private storage
The “shared-private” storage is for data which is multi-user but non-public. Some common use-cases include posts, user lists, videos, documents, DMs, and basically any other kind of artefact or experience in social or productivity software.
This part of the system is somewhat more complex. At this stage, I’m somewhat skeptical that a single approach can solve shared-private data. Here are some of the “rubrics” by which I’m judging proposals that I’ve seen so far:
How well does it handle scale?
Can it handle hundreds of participants? Thousands?
Does it introduce an excessive amount of duplicated data? (e.g. a mail system might)
How many connections are needed between participants, and how complex are the hand-shakes which prove access if they are needed?
What is the metadata leakage?
Is it apparent to outside parties that the exchange is occurring? Which outside parties?
What kind of data is leaked? Can the participants be enumerated from the outside? Is message existence or timing visible?
What are the security guarantees?
Are we attempting to provide end-to-end encryption, or a system which could support E2E?
How many parties are included in the communication? Like email, are their applications/providers which will have vision into most private communication? Is this avoidable in an open system?
If key material is leaked or access inappropriately granted, how wide is the data exposure? (This is a particular concern for any schemes which broadcast ciphertext via public records, which I find concerning.)
How many use-cases can the system handle?
Are their notable absences in the supported use-cases, such as private accounts or large-scale private groups?
Relatedly, will end-users be surprised to discover that the “private data” that protocol nerds have been celebrating doesn’t actually give them the feature they’d expect? How complete does the system need to be before we say a word about it to the wider public?
How straightforward is the end-user administration?
Will they be able to clearly administer access, either by the applications or some other PDS-level interface?
Does the sharing scheme introduce new challenges, such as inbox spam due to a push-messaging scheme?
How accessible is the developer experience and API surface?
Are the shared-private systems easy to understand and build with?
Do application developers need to learn yet another system, or does it feel like an intuitive extension to AT’s existing primitives?
Checkpoint
I'll re-iterate the broad goals: we want to handle personal-private and shared-private data; We do not want to increase the operational or resource costs of the PDS; we don’t want to sacrifice generality; and we need applications to sync the private data.
Personal-private data appears to be straight-forward. By creating a new private dataspace and then leveraging the new auth permission scopes, we should have a clear path forward for introducing unshared private data.
Shared-private storage is somewhat more complex. I've listed my current working rubric for solutions above, and I'm open to other requirements that we might need to consider.
It may seem daunting, but I'm very optimistic about the discussions happening in the community group and within the Bluesky team. In a follow-up post, I'll talk about some common shapes and properties I've seen in recent proposals.
Cheers.