Continuing from my previous leaflet on private data, I want to explore three broad schemes for shared-private storage that I’ve seen emerge in various proposals.
As a reminder, the shared-private storage is for data which is multi-user but non-public. Some common use-cases include posts, user lists, videos, documents, DMs, and basically any other kind of artefact or experience in social or productivity software.
It’s not yet clear to me which of these schemes is the right approach. It’s not even clear to me whether we’ll need just one, all three, or something not included here. The goal is not to be comprehensive or conclusive right now; I just want to share the general ideas.
Shared “arenas”
An important nuance of shared-private storage is that the initial record being shared is not a comprehensive picture of the system. It is not like email, where the exchange consists purely of “e-mail message” objects sent back & forth. A scheme for private threads must include a solution for the original post record, the replies, likes, reposts (if allowed) and so on.
For this leaflet, I’ll use the term arenas to describe the collection of records involved in some shared-private experience. An arena might be a DMs conversation, a private thread, a group with multiple private discussion threads, and so on. It’s a non-public exchange composed of records from multiple authors.
Estimating scale
Scale considerations factor heavily when all of the behaviors of these arenas are considered. Every time you create a new private exchange with its own set of recipients, you need to create a new arena. Some ways this can become quite extensive include:
Private threads which are addressed only to followed users might require 1 arena per user. For example, Alice needs her own “Private Threads Arena” which includes all of her followed users, while Bob needs his own Private Threads Arena with his followed users.
Private threads which are addressed to an arbitrary recipient list could lead to 1 arena per thread. If Alice creates a private thread for her, Bob, and Carla, then that’s a different arena than a separate private thread for her and Bob.
This explosion of arenas means that the active resource cost of a given arena needs to be near-zero.
Conversely, arenas which are oriented toward persistent groups such as DM conversations or discussion groups are less likely to explode the number of arenas, but are more likely to involve much larger numbers of users. The sharing model needs to be prepared for these large scale private groups as well.
Hosted arena scheme
The “hosted-arena” scheme specifies a server which hosts shared-private data on behalf of users. Access is mediated by the host server, and can consequently be revoked.
The hosted-arena scheme enables highly dynamic access rules and simplifies coordination. However, the host canonically decides which records are a part of the arena, and so it needs to be trusted not to drop messages from participants.
The general expectation is that hosted arenas will be contacted by applications which wish to display the arena. This means, for instance, that Bluesky would contact the host for the content of the arena, much like it does for feed generators.
Hosted arenas can not be trusted to faithfully represent user activity, and so any activity it hosts must be served in its original authenticated (aka signed) form. The reason for this is fairly intuitive: the incentive to misrepresent user activity is extremely high. This unfortunately means that we can’t just call out to an API to get computed views; the arena host needs to provide a bucket of signed records which are then reconstructed by a viewing application.
Mail scheme
The “mail” scheme uses email- or activitypub-style mailing semantics. The unit of data is a message with a “send” verb with a specified list of recipients.
Mail is transactional, unlike most data in AT, and its local state is detached from their remote state. Put another way, if you edit or delete a mailed record locally, those modifications are not reflected among previous recipients. Mailed objects are immutable post-transaction. (If you’re thinking, Why not make them mutable and sync the changes? then look at the next scheme.)
Mail schemes have fixed access rules (the recipients) though “mailing list” style forwarding bots can be used to condense recipient lists into a single controlled group. Revocation of a mailed message is not possible.
Synced-arena scheme
The “synced-arena” scheme uses sync channels to automatically propagate records and record-updates among multiple servers. Access is established using a list of granted viewers – either in the records or in some metadata – enabling servers to act on behalf of the viewing users to sync the records.
Any synced-arena scheme is going to function almost as a private relay. It will sync records along with authenticity proofs, and it will depend on store-and-forward semantics. Since there is likely a known member set, it should be possible to use gossip-protocol semantics with vector clocks to distribute load among the participants.
Who implements the schemes?
Broadly speaking, these schemes could be implemented by the PDS, by the Application Server, or a hybrid.
If implemented by the PDS, these schemes would be exposed entirely as APIs on the PDS which the Applications would use. If implemented by the Apps, the PDS would likely expose some utilities to facilitate the work – particularly for signing – and then it would be up to the Apps to engage in these schemes with each other.
Both of these approaches have tradeoffs. If implemented in the PDS, it’s going to be more rigid and could end up increasing the costs of PDS operation. If implemented in the Application Servers, it’s going to be more work for app developers.
Hybrid models are interesting to consider. For instance, the mail scheme could be handled as a kind of extension to personal-private data. If a record is written with the $recipients
metadata, the PDS could automatically fire off the record to the applications being used by the recipients.
For instance, if Bob is using Bluesky and Blacksky, then a private message addressed to Bob would deliver automatically to both of those apps. This would put the outbox tasks – which are fairly cheap – into the PDS’s hands, but the inbox tasks – which are costly due to spam, moderation, and aggregation – into the Apps’ hands.
Unsigned, signed, and structure-signed
Within these schemes, there are questions about whether the data is signed. Your options are no, yes, and yes within a cryptographic set structure (such as the Merkle Search Tree used in user repositories). Each of these have different properties.
Unsigned – 🚫 Replicable 🚫 Live
Don’t sign the records. If you communicate directly with the owning user’s PDS, this can work (by dint of source authority). This scheme only extends one “hop” of trust; you can’t replicate the records across multiple hosts like you can with public data in AT. You also get no guarantees about the continued accuracy of the record (liveness) since the authoring user has no way to assert it was deleted.
Signed – ✅ Replicable 🚫 Live
Sign the records as individual objects. This enables the records to be shared around multiple hops (replicable) but it means you can’t reliably assert that a signed record has been modified or deleted (liveness). It’s also hard to handle key rotations in this situation since every signed record has to be resigned.
For what it’s worth, the hosted arena scheme interferes with liveness too because the host could drop updates.
Structure-signed – ✅ Replicable ✅ Live
Put the records into a cryptographic structure such as a Merkle Search Tree and then sign the structure root. This is the equivalent of creating new data repositories for an arena. This enables multi-hop replication, clear revocation, and low-cost key rotation. The main downside is that you have to share the entire structure within an arena.
Checkpoint
In this leaflet, I defined "arenas" as the collection of records involved in some shared-private experience. I then described three general schemes for implementing arenas: hosted, mail, and synced. I don't consider that exhaustive, but they are three common proposals. I also made some general observations on which roles are implementers of the schemes -- PDS or Application -- and on the signing models that might be used.
At this stage, I don't have any strong preference for any of these schemes. I think all three are appealing for different reasons, and perhaps for different use-cases. It's possible that all three might be needed, or that a secret fourth (or fifth!) might be out there. I do hope this helps map the possibility space a little further.
Cheers.