object path support for s3 sync

This commit is contained in:
Evgenii Alekseev 2023-09-06 15:06:36 +03:00 committed by Evgeniy Alekseev
parent f22c10c70d
commit 22de00f053
8 changed files with 66 additions and 19 deletions

View File

@ -309,7 +309,7 @@ This feature requires GitHub key creation (see below). Section name must be eith
* ``repository`` - GitHub repository name, string, required. Repository must be created before any action and must have active branch (e.g. with readme). * ``repository`` - GitHub repository name, string, required. Repository must be created before any action and must have active branch (e.g. with readme).
* ``timeout`` - HTTP request timeout in seconds, int, optional, default is ``30``. * ``timeout`` - HTTP request timeout in seconds, int, optional, default is ``30``.
* ``use_full_release_name`` - if set to ``yes``, the release will contain both repository name and architecture, and only architecture otherwise, boolean, optional, default ``no``. * ``use_full_release_name`` - if set to ``yes``, the release will contain both repository name and architecture, and only architecture otherwise, boolean, optional, default ``no`` (legacy behavior).
* ``username`` - GitHub authorization user, string, required. Basically the same as ``owner``. * ``username`` - GitHub authorization user, string, required. Basically the same as ``owner``.
``remote-service`` type ``remote-service`` type
@ -338,5 +338,6 @@ Requires ``boto3`` library to be installed. Section name must be either ``s3`` (
* ``access_key`` - AWS access key ID, string, required. * ``access_key`` - AWS access key ID, string, required.
* ``bucket`` - bucket name (e.g. ``bucket``), string, required. * ``bucket`` - bucket name (e.g. ``bucket``), string, required.
* ``chunk_size`` - chunk size for calculating entity tags, int, optional, default 8 * 1024 * 1024. * ``chunk_size`` - chunk size for calculating entity tags, int, optional, default 8 * 1024 * 1024.
* ``object_path`` - path prefix for stored objects, string, optional. If none set, the prefix as in repository tree will be used.
* ``region`` - bucket region (e.g. ``eu-central-1``), string, required. * ``region`` - bucket region (e.g. ``eu-central-1``), string, required.
* ``secret_key`` - AWS secret access key, string, required. * ``secret_key`` - AWS secret access key, string, required.

View File

@ -642,6 +642,23 @@ How to sync to S3
region = eu-central-1 region = eu-central-1
secret_key = ... secret_key = ...
S3 with SSL
"""""""""""
In order to configure S3 on custom domain with SSL (and some other features, like redirects), the CloudFront should be used.
#. Configure S3 as described above.
#. In bucket properties, enable static website hosting with hosting type "Host a static website".
#. Go to AWS Certificate Manager and create public ceritificate on your domain. Validate domain as suggested.
#. Go to CloudFront and create distribution. The following settings are required:
* Origin domain choose S3 bucket.
* Tick use website endpoint.
* Disable caching.
* Select issued certificate.
#. Point DNS record to CloudFront address.
How to sync to Github releases How to sync to Github releases
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@ -43,6 +43,9 @@ In order to migrate to new filesystem tree the following actions are required:
#. #.
Edit configuration in case if anything is pointing to the old path, e.g. HTML report generation, in the way in which it will be pointed to directory inside repository specific one, e.g. ``/var/lib/ahriman/repository/x86_64`` to ``/var/lib/ahriman/repository/aur-clone/x86_64``. Edit configuration in case if anything is pointing to the old path, e.g. HTML report generation, in the way in which it will be pointed to directory inside repository specific one, e.g. ``/var/lib/ahriman/repository/x86_64`` to ``/var/lib/ahriman/repository/aur-clone/x86_64``.
#.
Make sure to update remote synchronization services if any. Almost all of them rely on current repository tree by default, so you need to setup either redirects or configure to synchronize to the old locations (e.g. ``object_path`` option for S3 synchronization).
#. #.
Enable and start services again. Unit template parameter should include both repository architecture and name, dash separated, e.g. ``x86_64-aur-clone``: Enable and start services again. Unit template parameter should include both repository architecture and name, dash separated, e.g. ``x86_64-aur-clone``:

View File

@ -32,6 +32,7 @@ Whereas old tree is still supported it is highly recommended to migrate to the n
* stop and disable all services; * stop and disable all services;
* run service-tree-migrate as ahriman user; * run service-tree-migrate as ahriman user;
* edit configuration to avoid pointing to the old paths; * edit configuration to avoid pointing to the old paths;
* update synchronization services in order to support new paths (or setup redirects);
* enable web and timer services again by using x86_64-aur-clone suffix, where x86_64 is your architecture and * enable web and timer services again by using x86_64-aur-clone suffix, where x86_64 is your architecture and
aur-clone is repository name; aur-clone is repository name;

View File

@ -19,9 +19,8 @@
# #
from __future__ import annotations from __future__ import annotations
import functools
from collections.abc import Iterable from collections.abc import Iterable
from functools import partial
from ahriman.core.exceptions import PartitionError from ahriman.core.exceptions import PartitionError
from ahriman.core.util import minmax, partition from ahriman.core.util import minmax, partition
@ -243,7 +242,7 @@ class Tree:
unprocessed = self.leaves[:] unprocessed = self.leaves[:]
while unprocessed: while unprocessed:
# additional workaround with partial in order to hide cell-var-from-loop pylint warning # additional workaround with partial in order to hide cell-var-from-loop pylint warning
predicate = functools.partial(Leaf.is_root, packages=unprocessed) predicate = partial(Leaf.is_root, packages=unprocessed)
new_level, unprocessed = partition(unprocessed, predicate) new_level, unprocessed = partition(unprocessed, predicate)
unsorted.append(new_level) unsorted.append(new_level)
@ -253,7 +252,7 @@ class Tree:
next_level = unsorted[next_num] next_level = unsorted[next_num]
# change lists inside the collection # change lists inside the collection
predicate = functools.partial(Leaf.is_dependency, packages=next_level) predicate = partial(Leaf.is_dependency, packages=next_level)
unsorted[current_num], to_be_moved = partition(current_level, predicate) unsorted[current_num], to_be_moved = partition(current_level, predicate)
unsorted[next_num].extend(to_be_moved) unsorted[next_num].extend(to_be_moved)
@ -280,12 +279,12 @@ class Tree:
while True: # python doesn't allow to use walrus operator to unpack tuples while True: # python doesn't allow to use walrus operator to unpack tuples
# get packages which depend on packages in chunk # get packages which depend on packages in chunk
predicate = functools.partial(Leaf.is_root, packages=chunk) predicate = partial(Leaf.is_root, packages=chunk)
unprocessed, new_dependent = partition(unprocessed, predicate) unprocessed, new_dependent = partition(unprocessed, predicate)
chunk.extend(new_dependent) chunk.extend(new_dependent)
# get packages which are dependency of packages in chunk # get packages which are dependency of packages in chunk
predicate = functools.partial(Leaf.is_dependency, packages=chunk) predicate = partial(Leaf.is_dependency, packages=chunk)
new_dependencies, unprocessed = partition(unprocessed, predicate) new_dependencies, unprocessed = partition(unprocessed, predicate)
chunk.extend(new_dependencies) chunk.extend(new_dependencies)

View File

@ -38,7 +38,7 @@ class S3(Upload):
Attributes Attributes
bucket(Any): boto3 S3 bucket object bucket(Any): boto3 S3 bucket object
chunk_size(int): chunk size for calculating checksums chunk_size(int): chunk size for calculating checksums
remote_root(Path): relative path to which packages will be uploaded object_path(Path): relative path to which packages will be uploaded
""" """
def __init__(self, repository_id: RepositoryId, configuration: Configuration, section: str) -> None: def __init__(self, repository_id: RepositoryId, configuration: Configuration, section: str) -> None:
@ -54,8 +54,12 @@ class S3(Upload):
self.bucket = self.get_bucket(configuration, section) self.bucket = self.get_bucket(configuration, section)
self.chunk_size = configuration.getint(section, "chunk_size", fallback=8 * 1024 * 1024) self.chunk_size = configuration.getint(section, "chunk_size", fallback=8 * 1024 * 1024)
paths = configuration.repository_paths if (object_path := configuration.get(section, "object_path", fallback=None)) is not None:
self.remote_root = paths.repository.relative_to(paths.root / "repository") # we need to avoid path conversion here, hence the string
self.object_path = Path(object_path)
else:
paths = configuration.repository_paths
self.object_path = paths.repository.relative_to(paths.root / "repository")
@staticmethod @staticmethod
def calculate_etag(path: Path, chunk_size: int) -> str: def calculate_etag(path: Path, chunk_size: int) -> str:
@ -132,7 +136,7 @@ class S3(Upload):
continue continue
local_path = path / local_file local_path = path / local_file
remote_path = self.remote_root / local_file.name remote_path = self.object_path / local_file.name
(mime, _) = mimetypes.guess_type(local_path) (mime, _) = mimetypes.guess_type(local_path)
extra_args = {"ContentType": mime} if mime is not None else None extra_args = {"ContentType": mime} if mime is not None else None
@ -160,8 +164,8 @@ class S3(Upload):
Returns: Returns:
dict[Path, Any]: map of path object to the remote s3 object dict[Path, Any]: map of path object to the remote s3 object
""" """
objects = self.bucket.objects.filter(Prefix=str(self.remote_root)) objects = self.bucket.objects.filter(Prefix=str(self.object_path))
return {Path(item.key).relative_to(self.remote_root): item for item in objects} return {Path(item.key).relative_to(self.object_path): item for item in objects}
def sync(self, path: Path, built_packages: list[Package]) -> None: def sync(self, path: Path, built_packages: list[Package]) -> None:
""" """

View File

@ -57,7 +57,6 @@ class UploadTrigger(Trigger):
}, },
"password": { "password": {
"type": "string", "type": "string",
"required": True,
}, },
"repository": { "repository": {
"type": "string", "type": "string",
@ -131,6 +130,9 @@ class UploadTrigger(Trigger):
"coerce": "integer", "coerce": "integer",
"min": 0, "min": 0,
}, },
"object_path": {
"type": "string",
},
"region": { "region": {
"type": "string", "type": "string",
"required": True, "required": True,

View File

@ -3,12 +3,32 @@ from pytest_mock import MockerFixture
from typing import Any from typing import Any
from unittest.mock import MagicMock, call as MockCall from unittest.mock import MagicMock, call as MockCall
from ahriman.core.configuration import Configuration
from ahriman.core.upload.s3 import S3 from ahriman.core.upload.s3 import S3
from ahriman.models.repository_paths import RepositoryPaths
_chunk_size = 8 * 1024 * 1024 _chunk_size = 8 * 1024 * 1024
def test_object_path(configuration: Configuration, mocker: MockerFixture) -> None:
"""
must correctly read object path
"""
_, repository_id = configuration.check_loaded()
# new-style tree
assert S3(repository_id, configuration, "customs3").object_path == Path("aur-clone/x86_64")
# legacy tree
mocker.patch.object(RepositoryPaths, "_suffix", Path("x86_64"))
assert S3(repository_id, configuration, "customs3").object_path == Path("x86_64")
# user defined prefix
configuration.set_option("customs3", "object_path", "local")
assert S3(repository_id, configuration, "customs3").object_path == Path("local")
def test_calculate_etag_big(resource_path_root: Path) -> None: def test_calculate_etag_big(resource_path_root: Path) -> None:
""" """
must calculate checksum for path which is more than one chunk must calculate checksum for path which is more than one chunk
@ -68,12 +88,12 @@ def test_files_upload(s3: S3, s3_remote_objects: list[Any], mocker: MockerFixtur
upload_mock.upload_file.assert_has_calls( upload_mock.upload_file.assert_has_calls(
[ [
MockCall( MockCall(
Filename=str(root / s3.remote_root / "b"), Filename=str(root / s3.object_path / "b"),
Key=f"{s3.remote_root}/b", Key=f"{s3.object_path}/b",
ExtraArgs={"ContentType": "text/html"}), ExtraArgs={"ContentType": "text/html"}),
MockCall( MockCall(
Filename=str(root / s3.remote_root / "d"), Filename=str(root / s3.object_path / "d"),
Key=f"{s3.remote_root}/d", Key=f"{s3.object_path}/d",
ExtraArgs=None), ExtraArgs=None),
], ],
any_order=True) any_order=True)
@ -92,7 +112,7 @@ def test_get_remote_objects(s3: S3, s3_remote_objects: list[Any]) -> None:
""" """
must generate list of remote objects by calling boto3 function must generate list of remote objects by calling boto3 function
""" """
expected = {Path(item.key).relative_to(s3.remote_root): item for item in s3_remote_objects} expected = {Path(item.key).relative_to(s3.object_path): item for item in s3_remote_objects}
s3.bucket = MagicMock() s3.bucket = MagicMock()
s3.bucket.objects.filter.return_value = s3_remote_objects s3.bucket.objects.filter.return_value = s3_remote_objects