docs: docs review

2025-12-16 04:03:42 +00:00 · 2023-12-19 16:07:04 +02:00
parent c130f52163
commit a872ecfc23
21 changed files with 144 additions and 97 deletions
--- a/docs/architecture.rst
+++ b/docs/architecture.rst
@ -6,7 +6,7 @@ Package structure

 Packages have strict rules of importing:

-* ``ahriman.application`` package must not be used anywhere except for itself.
+* ``ahriman.application`` package must not be used outside of this package.
 * ``ahriman.core`` and ``ahriman.models`` packages don't have any import restriction. Actually we would like to totally restrict importing of ``core`` package from ``models``, but it is impossible at the moment.
 * ``ahriman.web`` package is allowed to be imported from ``ahriman.application`` (web handler only, only ``ahriman.web.web`` methods). It also must not be imported globally, only local import is allowed. 

@ -19,10 +19,12 @@ Full dependency diagram:
 ``ahriman.application`` package
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-This package contains application (aka executable) related classes and everything for it. It also contains package called ``ahriman.application.handlers`` in which all available subcommands are described as separated classes derived from base ``ahriman.application.handlers.Handler`` class.
+This package contains application (aka executable) related classes and everything for it. It also contains package called ``ahriman.application.handlers`` in which all available subcommands are described as separated classes derived from the base ``ahriman.application.handlers.Handler`` class.

 ``ahriman.application.application.Application`` (god class) is used for any interaction from parsers with repository. It is divided into multiple traits by functions (package related and repository related) in the same package.

+``ahriman.application.application.workers`` package contains specific wrappers for local and remote build processes.
+
 ``ahriman.application.ahriman`` contains only command line parses and executes specified ``Handler`` on success, ``ahriman.application.lock.Lock`` is additional class which provides file-based lock and also performs some common checks.

 ``ahriman.core`` package
@ -31,13 +33,13 @@ This package contains application (aka executable) related classes and everythin
 This package contains everything required for the most of application actions and it is separated into several packages:

 * ``ahriman.core.alpm`` package controls pacman related functions. It provides wrappers for ``pyalpm`` library and safe calls for repository tools (``repo-add`` and ``repo-remove``). Also this package contains ``ahriman.core.alpm.remote`` package which provides wrapper for remote sources (e.g. AUR RPC and official repositories RPC).
-* ``ahriman.core.auth`` package provides classes for authorization methods used by web mostly. Base class is ``ahriman.core.auth.Auth`` which must be called by ``load`` method.
+* ``ahriman.core.auth`` package provides classes for authorization methods used by web mostly. Base class is ``ahriman.core.auth.Auth`` which must be instantiated by ``load`` method.
 * ``ahriman.core.build_tools`` is a package which provides wrapper for ``devtools`` commands.
 * ``ahriman.core.configuration`` contains extension for standard ``configparser`` library and some validation related classes.
-* ``ahriman.core.database`` is everything including data and schema migrations for database.
+* ``ahriman.core.database`` is everything for database, including data and schema migrations.
 * ``ahriman.core.formatters`` package provides ``Printer`` sub-classes for printing data (e.g. package properties) to stdout which are used by some handlers.
 * ``ahriman.core.gitremote`` is a package with remote PKGBUILD triggers. Should not be called directly.
-* ``ahriman.core.http`` package provides HTTP clients which can be later used by other classes.
+* ``ahriman.core.http`` package provides HTTP clients which can be used later by other classes.
 * ``ahriman.core.log`` is a log utils package. It includes logger loader class, custom HTTP based logger and access logger for HTTP services with additional filters.
 * ``ahriman.core.report`` is a package with reporting triggers. Should not be called directly.
 * ``ahriman.core.repository`` contains several traits and base repository (``ahriman.core.repository.Repository`` class) implementation.
@ -105,7 +107,7 @@ For historical reasons and in order to keep backward compatibility some subcomma
 Filesystem tree
 ---------------

-The application supports two types of trees, one is for the legacy configuration (when there were no repository name explicit configuration available) and another one is the new-style tree. This document describes only new-style tree in order to avoid deprecated structures.
+The application supports two types of trees, one is for the legacy configuration (when there were no explicit repository name configuration available) and another one is the new-style tree. This document describes only new-style tree in order to avoid deprecated structures.

 Having default root as ``/var/lib/ahriman`` (differs from container though), the directory structure is the following:

@ -141,7 +143,7 @@ There are multiple subdirectories, some of them are commons for any repository,

 * ``cache`` is a directory with locally stored PKGBUILD's and VCS packages. It is common for all repositories and architectures.
 * ``chroot/{repository}`` is a chroot directory for ``devtools``. It is specific for each repository, but shared for different architectures inside (the ``devtools`` handles architectures automatically).
-* ``packages/{repository}/{architecture}`` is a directory with prebuilt packages. When package is built, first it will be uploaded to this directory and later will be handled by update process. It is architecture and repository specific.
+* ``packages/{repository}/{architecture}`` is a directory with prebuilt packages. When a package is built, first it will be uploaded to this directory and later will be handled by update process. It is architecture and repository specific.
 * ``pacman/{repository}/{architecture}`` is the repository and architecture specific caches for pacman's databases.
 * ``repository/{repository}/{architecture}`` is a repository packages directory.

@ -155,16 +157,16 @@ The service uses SQLite database in order to store some internal info.
 Database instance
 ^^^^^^^^^^^^^^^^^

-All methods related to specific part of database (basically operations per table) are split into different traits located inside ``ahriman.core.database.operations`` package. The base trait ``ahriman.core.database.operations.Operations`` also provides generic methods for database access (e.g. row converters and transactional support).
+All methods related to the specific part of database (basically operations per table) are split into different traits located inside ``ahriman.core.database.operations`` package. The base trait ``ahriman.core.database.operations.Operations`` also provides generic methods for database access (e.g. row converters and transactional support).

 The ``ahriman.core.database.SQLite`` class itself derives from all of these traits and implements methods for initialization, including migrations.

 Schema and data migrations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^

-The schema migration are applied according to current ``pragma user_info`` values, located at ``ahriman.core.database.migrations`` package and named as ``m000_migration_name.py`` (the preceding ``m`` is required in order to import migration content for tests). Additional class ``ahriman.core.database.migrations.Migrations`` reads all migrations automatically and applies them in alphabetical order.
+The schema migrations are applied according to current ``pragma user_info`` values, located at ``ahriman.core.database.migrations`` package and named as ``m000_migration_name.py`` (the preceding ``m`` is required in order to import migration content for tests). Additional class ``ahriman.core.database.migrations.Migrations`` reads all migrations automatically and applies them in alphabetical order.

-These migrations can also contain data migrations. Though the recommended way is to migrate data directly from SQL requests, sometimes it is required to have external data (like packages list) in order to set correct data. To do so, special method ``migrate_data`` is used.
+These migrations can also contain data migrations. Though the recommended way is to migrate data directly from SQL queries, sometimes it is required to have external data (like packages list) in order to set correct data. To do so, special method ``migrate_data`` is used.

 Type conversions
 ^^^^^^^^^^^^^^^^
@ -180,15 +182,15 @@ By default package build operations are performed with ``PACKAGER`` which is spe

 * If packager is not set, it reads environment variables (e.g. ``SUDO_USER`` and ``USER``), otherwise it uses value from command line.
 * It checks users for the specified username and tries to extract packager variable from it.
-* If packager value has been found, it will be passed as ``PACKAGER`` system variable (sudo configuration required).
+* If packager value has been found, it will be passed as ``PACKAGER`` system variable (additional sudo configuration might be required).

 Add new packages or rebuild existing
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Idea is to copy package to the directory from which it will be handled at the next update run. Different variants are supported:
+Idea is to add package to a build queue from which it will be handled automatically during the next update run. Different variants are supported:

-* If supplied argument is file then application moves the file to the directory with built packages. Same rule applies for directory, but in this case it copies every package-like file from the specified directory.
-* If supplied argument is directory and there is ``PKGBUILD`` file there it will be treated as local package. In this case it will queue this package to build and copy source files (``PKGBUILD`` and ``.SRCINFO``) to caches.
+* If supplied argument is file, then application moves the file to the directory with built packages. Same rule applies for directory, but in this case it copies every package-like file from the specified directory.
+* If supplied argument is directory and there is ``PKGBUILD`` file there, it will be treated as local package. In this case it will queue this package to build and copy source files (``PKGBUILD`` and ``.SRCINFO``) to caches.
 * If supplied argument is not file then application tries to lookup for the specified name in AUR and clones it into the directory with manual updates. This scenario can also handle package dependencies which are missing in repositories.

 This logic can be overwritten by specifying the ``source`` parameter, which is partially useful if you would like to add package from AUR, but there is local directory cloned from AUR. Also official repositories calls are hidden behind explicit source definition.
@ -206,7 +208,7 @@ This flow removes package from filesystem, updates repository database and also
 Update packages
 ^^^^^^^^^^^^^^^

-This feature is divided into to stages: check AUR for updates and run rebuild for required packages. Whereas check does not do anything except for check itself, update flow is the following:
+This feature is divided into to the following stages: check AUR for updates and run rebuild for required packages. Whereas check does not do anything except for check itself, update flow is the following:

 #. Process every built package first. Those packages are usually added manually.
 #. Run sync and report methods.
@ -222,13 +224,33 @@ This feature is divided into to stages: check AUR for updates and run rebuild fo

 After any step any package data is being removed.

+In case if there are configured workers, the build process itself will be delegated to the remote instances. Packages will be partitioned to the chunks according to the amount of configured workers.
+
+Distributed builds
+^^^^^^^^^^^^^^^^^^
+
+This feature consists of two parts:
+
+* Upload built packages to the node.
+* Delegate packages building to separated nodes.
+
+The upload process is performed via special API endpoint, which is disabled by default, and is performed in several steps:
+
+#. Upload package to temporary file.
+#. Copy content from temporary file to the built package directory with dot (``.``) prefix.
+#. Rename copied file, removing preceding dot.
+
+After success upload, the update process must be called as usual in order to copy built packages to the main repository tree.
+
+On the other side, the delegation uses upload feature, but in addition it also calls external services in order to trigger build process. The packages are separated to chunks based on the amount of the configured workers and their dependencies.
+
 pkgrel bump rules
 ^^^^^^^^^^^^^^^^^

 The application is able to automatically bump package release (``pkgrel``) during build process if there is duplicate version in repository. The version will be incremented as following:

 #. Get version of the remote package.
-#. Get version of the local package if any.
+#. Get version of the local package if available.
 #. If local version is not set, proceed with remote one.
 #. If local version is set and epoch or package version (``pkgver``) are different, proceed with remote version.
 #. If local version is set and remote version is newer than local one, proceed with remote.
@ -247,7 +269,7 @@ Configuration
 Enumerations
 ^^^^^^^^^^^^

-All enumerations are derived from ``str`` and ``enum.Enum``. Integer enumerations in general are not allowed, because most of operations require conversions from string variable. Derivation from string class is required to make json conversions implicitly (e.g. during calling ``json.dumps`` methods).
+All enumerations are derived from ``enum.StrEnum``. Integer enumerations in general are not allowed, because most of operations require conversions from string variable. Derivation from string based enumeration is required to make json conversions implicitly (e.g. during calling ``json.dumps`` methods).

 In addition, some enumerations provide ``from_option`` class methods in order to allow some flexibility while reading configuration options.

@ -319,9 +341,9 @@ There are several supported synchronization providers, currently they are ``rsyn

 ``rsync`` provider does not have any specific logic except for running external rsync application with configured arguments. The service does not handle SSH configuration, thus it has to be configured before running application manually.

-``s3`` provider uses ``boto3`` package and implements sync feature. The files are stored in architecture directory (e.g. if bucket is ``repository``, packages will be stored in ``repository/aur-clone/x86_64`` for the ``aur-clone`` repository ``x86_64`` architecture), bucket must be created before any action and API key must have permissions to write to the bucket. No external configuration required. In order to upload only changed files the service compares calculated hashes with the Amazon ETags, used realization is described `here <https://teppen.io/2018/10/23/aws_s3_verify_etags/>`__.
+``s3`` provider uses ``boto3`` package and implements sync feature. The files are stored in architecture specific directory (e.g. if bucket is ``repository``, packages will be stored in ``repository/aur-clone/x86_64`` for the ``aur-clone`` repository and ``x86_64`` architecture), bucket must be created before any action and API key must have permissions to write to the bucket. No external configuration required. In order to upload only changed files the service compares calculated hashes with the Amazon ETags, the implementation used is described `here <https://teppen.io/2018/10/23/aws_s3_verify_etags/>`__.

-``github`` provider authenticates through basic auth, API key with repository write permissions is required. There will be created a release with the name of the architecture in case if it does not exist; files will be uploaded to the release assets. It also stores array of files and their MD5 checksums in release body in order to upload only changed ones. According to the Github API in case if there is already uploaded asset with the same name (e.g. database files), asset will be removed first.
+``github`` provider authenticates through basic auth, API key with repository write permissions is required. There will be created a release with the name of the architecture in case if it does not exist; files will be uploaded to the release assets. It also stores array of files and their MD5 checksums in release body in order to upload only changed ones. According to the GitHub API in case if there is already uploaded asset with the same name (e.g. database files), asset will be removed first.

 Additional features
 ^^^^^^^^^^^^^^^^^^^
@ -340,7 +362,7 @@ Web application
 Web application requires the following python packages to be installed:

 * Core part requires ``aiohttp`` (application itself), ``aiohttp_jinja2`` and ``Jinja2`` (HTML generation from templates).
-* Additional web features also require ``aiohttp-apispec`` (autogenerated documentation), ``aiohttp_cors`` (CORS support, required by documentation)
+* Additional web features also require ``aiohttp-apispec`` (autogenerated documentation), ``aiohttp_cors`` (CORS support, required by documentation).
 * In addition, ``aiohttp_debugtoolbar`` is required for debug panel. Please note that this option does not work together with authorization and basically must not be used in production.
 * In addition, authorization feature requires ``aiohttp_security``, ``aiohttp_session`` and ``cryptography``.
 * In addition to base authorization dependencies, OAuth2 also requires ``aioauth-client`` library.
@ -363,7 +385,7 @@ Web views

 All web views are defined in separated package and derived from ``ahriman.web.views.base.Base`` class which provides typed interfaces for web application. 

-REST API supports both form and JSON data, but the last one is recommended. 
+REST API supports only JSON data.

 Different APIs are separated into different packages:

@ -377,12 +399,12 @@ The views are also divided by supporting API versions (e.g. ``v1``, ``v2``).
 Templating
 ^^^^^^^^^^

-Package provides base jinja templates which can be overridden by settings. Vanilla templates are actively using bootstrap library.
+Package provides base jinja templates which can be overridden by settings. Vanilla templates actively use bootstrap library.

 Requests and scopes
 ^^^^^^^^^^^^^^^^^^^

-Service provides optional authorization which can be turned on in settings. In order to control user access there are two levels of authorization - read-only (only GET-like requests) and write (anything) which are provided by each web view directly.
+Service provides optional authorization which can be turned on in settings. In order to control user access there are two levels of authorization - read-only (only GET-like requests) and write (anything), settings for which are provided by each web view directly.

 If this feature is configured any request will be prohibited without authentication. In addition, configuration flag ``auth.allow_read_only`` can be used in order to allow read-only operations - reading index page and packages - without authorization.

@ -393,4 +415,4 @@ External calls

 Web application provides external calls to control main service. It spawns child process with specific arguments and waits for its termination. This feature must be used either with authorization or in safe (i.e. when status page is not available world-wide) environment.

-For most actions it also extracts user from authentication (if provided) and passes it to underlying process.
+For most actions it also extracts user from authentication (if provided) and passes it to the underlying process.