Portage + ORAS: Using a Docker Registry for Gentoo Packages

You didn't think I had a Type R or '86 did you?

Portage is Gentoo’s source-based package manager, commonly invoked with the emerge tool. The Prefix project allows it to be used by users of other distributions and Unix-like operating systems in a “virtualenv”-type fashion. In Portage, packages are described in ebuilds, which are composed of shell functions understandable by many Linux users. This makes it a good fit for sharing packages across distributions and users working on a project.

To compile packages from source, Portage requires being able to fetch archives of the software, and requires those archives to have stable hashes. This works fine for public source code: projects often provide archives for releases, and the Gentoo project can mirror archives generated automatically by public forges. This requirement poses problems for private packages, however, as the archives generated by forges are not guaranteed to be public.

After being stuck manually copying archives between machines for a year, I finally decided to tackle this problem this past weekend. I could mirror the archives to a web server, like the Gentoo project does. However, there was no web server already available, and I was not interested in setting up infrastructure just for Portage, sorting out the permissions to allow multiple people to upload archives, or managing backups of the archives.

There was one bit of infrastructure that was already setup with these attributes: a Docker (or OCI) registry. While we tend to think of these registries as a place to distribute container images and their layers, and they are, beneath the surface they are content-addressable stores that can be used to distribute other types artifacts too. The Notary and cosign projects distribute signatures of containers by storing them in the registry, and Helm charts can be distributed via registries as well.

Could I use a registry as a source archive mirror? Yes! The ORAS1 project provides a CLI, and libraries, for uploading and downloading artifacts to these registries. For example to upload and download the file “hello-world.txt”, we can can use the oras command and “push” and “pull” with an image tag, very similar to using docker or podman.

oras push example.com/hello:v1 hello-world.txt
oras pull example.com/hello:v1

Using the ebuild for direnv v2.34.0 for example, there are two source archives: one is an archive of the source code, and the other is an archive of the Go module dependencies, following the current recommendations from the Gentoo Wiki. After we have download and generated these two archives, we can upload them to a registry with a variation on the above “push” command.

oras push --artifact-type application/vnd.example.archives.v1 \
    example.com/distfiles/direnv:2.34.0 \
    direnv-2.34.0.tar.gz:application/x-compressed-tar \
    direnv-2.34.0-vendor.tar.xz:application/x-compressed-tar

This creates one tag, “example.com/distfiles/direnv:2.34.0”, with two layers for the two archives we uploaded. We can later pull these archives by pulling the tag.

oras pull example.com/distfiles/direnv:2.34.0

At this point you might be thinking to yourself how this is helpful for ebuilds? As each tarball is now in the content-addressable store, we can use this location in the ebuild. To get those locations we can fetch the manifest at the tag.

$ oras manifest fetch example.com/distfiles/direnv:2.34.0 --pretty
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.example.archives.v1",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "layers": [
    {
      "mediaType": "application/x-compressed-tar",
      "digest": "sha256:3d7067e71500e95d69eac86a271a6b6fc3f2f2817ba0e9a589524bf3e73e007c",
      "size": 94449,
      "annotations": {
        "org.opencontainers.image.title": "direnv-2.34.0.tar.gz"
      }
    },
    {
      "mediaType": "application/x-compressed-tar",
      "digest": "sha256:b86942d442f7e2a92a86dcad9b36d9316948e990592f65314137235df6c43293",
      "size": 327916,
      "annotations": {
        "org.opencontainers.image.title": "direnv-2.34.0-vendor.tar.xz"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-09-04T21:08:22Z"
  }
}

Each source artifact was uploaded as a layer, and can be identified with the “org.opencontainers.image.title” annotation automatically added by ORAS during our push. We can then use the digest as part of the URL in the ebuild.

SRC_URI="https://example.com/v2/distfiles/direnv/blobs/sha256:3d7067e71500e95d69eac86a271a6b6fc3f2f2817ba0e9a589524bf3e73e007c -> ${P}.tar.gz"
SRC_URI+=" https://example.com/v2/distfiles/direnv/blobs/sha256:b86942d442f7e2a92a86dcad9b36d9316948e990592f65314137235df6c43293 -> ${P}-vendor.tar.xz"

While this would work, it’s not very convenient to update for new versions. We can do better by leveraging Portage’s support for customizing fetch commands with FETCHCOMMAND. Portage ships with commands for http(s), FTP, SSH, SFTP, and rsync.

$ portageq envvar FETCHCOMMAND_RSYNC
rsync -LtvP "${URI}" "${DISTDIR}/${FILE}"

By following the format of FETCHCOMMAND_${protocol}, we can add support for our own protocols by setting a command in Portage’s configuration.

I’ve a new fetch command FETCHCOMMAND_ORAS defining a custom “oras” protocol. This uses the ORAS CLI to fetch the manifest referenced by a tag, selects the digest of the requested file, and downloads it to the prescribed location.2

FETCHCOMMAND_ORAS="bash -c \"x=\\\${0#oras://}; image=\\\${x%/*}; blob=\\\${x##*/}; oras blob fetch \\\$image@\\\$(oras manifest fetch \\\$image | jq -r --arg blob \\\$blob '.layers|map(select(.annotations[\\\"org.opencontainers.image.title\\\"]==\\\$blob))[0].digest') -o \\\$1 \" \${URI} \${DISTDIR}/\${FILE}"

Then we can reference this new protocol in our ebuild, reducing the amount of toil needed to update or add a new package.

SRC_URI="oras://example.com/distfiles/${PN}:${PV}/${P}.tar.gz"
SRC_URI+=" oras://example.com/distfiles/${PN}:${PV}/${P}-vendor.tar.xz"

As an end user, once configured fetching artifacts with ORAS is then transparently used by Portage.

$ emerge -1v apps-shells/direnv
>>> Emerging (1 of 1) app-shells/direnv-2.34.0::terinjokes
>>> Downloading 'oras://example.com/distfiles/direnv:2.34.0/direnv-2.34.0.tar.gz'
✓ Downloaded  application/octet-stream               92.2/92.2 kB 100.00%     0s
  └─ sha256:3d7067e71500e95d69eac86a271a6b6fc3f2f2817ba0e9a589524bf3e73e007c
 * direnv-2.34.0.tar.gz BLAKE2B SHA512 size ;-) ...                                                                               [ ok ]
>>> Downloading 'oras://example.com/distfiles/direnv:2.34.0/direnv-2.34.0-vendor.tar.xz'
✓ Downloaded  application/octet-stream                 320/320 kB 100.00%     0s
  └─ sha256:b86942d442f7e2a92a86dcad9b36d9316948e990592f65314137235df6c43293
 * direnv-2.34.0-vendor.tar.xz BLAKE2B SHA512 size ;-) ...                                                                        [ ok ]

  1. OCI Registry As Storage ↩︎

  2. Hopefully in the future oras pull will support selecting specific file names, avoiding the subshell to fetch the manifest and parse with jq. ↩︎