So over the past couple days I moved my PDS hosting from a server on UpCloud to my self-hosted Kubernetes cluster, Phoebe. Not for anything cost-related, UpCloud are actually really nicely cost-effective, and I still run a couple things on their stack (including the backup storage for Phoebe in their Object Storage offering), but moreso because I have a crippling Kubernetes addiction and I love self-hosting stuff.

However, there were a couple gotchas involved when I was trying to build out my setup. So this post is gonna be partially documenting how I host it, but also some of the weird issues I saw along the way.


So the main problems with just going "fuck it, let's run my PDS in Kube" boil down to these three:

  • How do I run it? (there's no official Helm chart for the PDS)

  • How do I handle routing? (K8s is a little painful for wildcard ingress/gateway stuff)

  • How do I cleanly migrate my data? (moving my data from the VPS that hosted it into Kube)

Now, I solved all of those, but we're gonna start by answering question one; how.

Like I said, the Bluesky team don't publish a Helm chart for the PDS (yet, I may try and submit a PR), but what they do publish is a Docker image. Because of this, we can make use of a wonderful generic deploy-anything Helm chart called app-template, which will allow us to set up a very quick deployment for the PDS that will handle running updates and coordinating things like services and Gateway HTTPRoutes simple.

First off, let's define our main controller and container. The controller in app-template is basically used like a reference for how Kubernetes (and the chart itself) determine what talks to what (like which services map to which container sets).

# values.yaml
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/bjw-s-labs/helm-charts/app-template-4.4.0/charts/other/app-template/values.schema.json
controllers:
  pds:
    containers:
      main:
        image:
          repo: ghcr.io/bluesky-social/pds
          tag: 0.4.188
        env:
          PDS_HOSTNAME: pds.<domain>
          PDS_PORT: &port 2583

          PDS_DATA_DIRECTORY: /tmp/data
          PDS_BLOBSTORE_DISK_LOCATION: /tmp/data/blocks
          PDS_BLOB_UPLOAD_LIMIT: '52428800'

          PDS_EMAIL_FROM_ADDRESS: pds@<domain>

          PDS_DID_PLC_URL: https://plc.directory
          PDS_BSKY_APP_VIEW_URL: https://api.pop1.bsky.app
          PDS_BSKY_APP_VIEW_DID: did:web:api.bsky.app
          PDS_REPORT_SERVICE_URL: https://mod.bsky.app
          PDS_REPORT_SERVICE_DID: did:plc:ar7c4by46qjdydhdevvrndac
          PDS_CRAWLERS: https://bsky.network

          LOG_ENABLED: true
          TZ: Europe/London
        envFrom:
          - secretRef:
              name: atproto-pds-secret
        probes:
          liveness: &probes
            enabled: true
            custom: true
            spec:
              httpGet:
                path: /xrpc/_health
                port: *port
              initialDelaySeconds: 0
              periodSeconds: 10
              timeoutSeconds: 1
              failureThreshold: 3
          readiness: *probes
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities: { drop: ["ALL"] }

defaultPodOptions:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
    fsGroupChangePolicy: OnRootMismatch

Now, this configuration does a few things, but if you've worked with Deployments in Kubernetes before, it should look vaguely familiar. All this is doing currently is:

  • Configuring a Deployment for our PDS

  • Telling it to use the ghcr.io/bluesky-social/pds image with version 0.4.188

  • Giving it environment variables (both inline for non-secret values and reading from a secret for those ones)

  • Giving it some healthchecks to run so it can automatically replace our container if it stops responding

  • Applying some security options to our container so that it runs as non-root and can't escalate itself _to_ root, as well as dropping linux capabilities that it doesn't need (in the PDS' case, this is _every_ capability)

We can install this to our cluster by running the following:

helm repo add bjw-s https://bjw-s-labs.github.io/helm-charts
helm upgrade --install atproto-pds bjw-s/app-template --namespace atproto-pds --create-namespace -f ./values.yaml

Now, we should see we get an atproto-pds container starting up in our cluster. Awesome!

 kubectl get pods -n atproto-pds
NAME                                       READY   STATUS    RESTARTS      AGE
atproto-pds-64cf9cf984-965jr               1/1     Running   0             1m

However, we've got no way to _reach_ our cluster. So let's go ahead and add a Service to our Helm values, too.


# values.yaml
---
controllers:
  ...

defaultPodOptions:
  ...

service:
  pds:
    controller: pds
    ports:
      http:
        port: *port

(note: the *port here is referencing the PDS_PORT: &port 2583 we entered earlier. This is a YAML feature called an Anchor.)

Now we've added a Service, let's update our Helm chart again:

helm upgrade atproto-pds bjw-s/app-template --namespace atproto-pds -f ./values.yaml

Awesome, now you should see a service!

 kubectl get svc -n atproto-pds
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
atproto-pds               ClusterIP      10.107.114.16    <none>        2583/TCP                     1m

Let's try connecting to it to get a response. First, let's get a port forwarded for it.

 kubectl port-forward service/atproto-pds -n atproto-pds 8080:http
Forwarding from 127.0.0.1:8080 -> 2583
Forwarding from [::1]:8080 -> 2583

Now, if we open localhost:8080 in our browser, we should see this:

The default response from an ATProto PDS, as a browser screenshot, saying "This is an AT Protocol Personal Data Server (aka, an atproto PDS)" and giving links to GitHub and ATProto documentation.

Cool. Now we've done that, let's close our port forward (hit Ctrl+C in the terminal you started the port-forward command in) and look into persistence.


Currently, if the PDS container restarts or is deleted, all our data is gone with it! Not very handy. To remedy this, let's add a Persistent Volume Claim (PVC) and a volume mount to our setup.

# values.yaml
---
controllers:
  atproto-pds:
    containers:
      main:
        ...
        env:
          ...

          PDS_DATA_DIRECTORY: /data
          PDS_BLOBSTORE_DISK_LOCATION: /data/blocks

          ...
        ...

persistence:
  data:
    enabled: true
    type: persistentVolumeClaim
    accessMode: ReadWriteOnce
    size: 5Gi

Now, thanks to the way that app-template works, because we've defined a persistence item called data, it will automatically be mounted at /data inside the container, so we don't have to manually tell it to do that.

We've also told the container to use /data and /data/blocks as its data directory and blob storage location respectively by setting PDS_DATA_DIRECTORY and PDS_BLOBSTORE_DISK_LOCATION on the container's environment variable set.

The disk is 5 GB in size (size: 5Gi), but you can make that bigger if you want.

Now, when we re-apply the chart, we should have a volume set up and our container should now have a /data directory mounted.


Next, let's set up routing/ingress.

---
...
# If you're using Gateway API like me:
route:
  pds:
    annotations:
      external-dns.alpha.kubernetes.io/cloudflare-proxied: "false"
    hostnames:
      - pds.<domain>
      - '*.pds.<domain>'
    parentRefs:
      - name: external
        namespace: network
        sectionName: https
    rules:
      - backendRefs:
          - identifier: pds
            port: *port

This sets up a Gateway API HTTPRoute which is used to route traffic from outside the cluster into the cluster.

Replace the reference in parentRefs with whatever is relevant to your setup.

The annotation on it prevents External DNS from proxying your traffic with Cloudflare (if you're not using External DNS with the Cloudflare provider/webhook/etc., just skip that bit).

If you're not using Gateway API, though, you can also use Ingress, though I _strongly_ recommend switching to Gateway and Envoy or similar.

The only exercise for the reader from here is ensuring your TLS certs will work for pds.<domain> and (crucially) *.pds.<domain>. The second one is a little more annoying, and is the reason for me turning off proxying through Cloudflare (it messes up TLS termination on wildcard subdomains for some reason).

Now, finally, we can redeploy the Helm chart and you should have a usable PDS running entirely within Kubernetes!

helm upgrade atproto-pds bjw-s/app-template --namespace atproto-pds -f ./values.yaml

At this point, you should be able to hit pds.<yourdomain> in your browser and be greeted with the same welcome page we saw before. From here, you can use it the same way as if you used the installer script, you just need to figure out how to pass it the things like admin passwords and such to get your invite codes.


Hopefully this ends up being useful for folks, I know it took me a while to get it all working nicely. If you want to see how my setup works in particular, the Helm Release flux resource I built is here.

If you enjoyed this, consider supporting me on ko-fi and maybe at some point I'll be able to afford a new node for my on-prem cluster.