Ivan Tikhonov's Blog

A collection of post-its I couldn't have found on the internet


Advanced validators in argparse

argparse --help

At some point in a project's lifecycle, scripts tend to be reused across different contexts. In my experience, without a consistent interface, people often start to directly modify the source code for each new context, which makes synchronization and updates painfully slow and error-prone.

Also I love a good CLI. The one that knows what it wants, how it wants it, andcommunicates it clearly. And I want to record a few tricks up my sleeve after spending some time "in the fields".

argparse --validate

I think it is a common knowledge that argparse is a Python library for convinient handling of CLI arguments. And it is also a common knowledge that each parameter supports custom validation through the type parameter. What is less commonly emphasized, is that it can accpet any callable that takes a single string and returs a transformed value.

This opens an opportunity to implement custom validation logic at the CLI boundary, instead of manually post-processing arguments after parsing.

For example, we can define a simple validator that rejects certain input patterns:

def validator(value: str):
    stoplist = {
        "not", "skip", "off", "don't", 'don"t', "no", "false"
    }
    words = "".join([c for c in value if c.isalpha()]).lower().split(" ")
    if any(word in stoplist for word in words):
        raise ArgumentTypeError(
            "In Soviet Union you don't turn off validation, "
            "but validation turns you off."
        )
    return value

This function can then be attached directly to an argument definition:

parser.add_argument(
    "--validate",
    type=validator,
    help="Caution! Validated parameter! Do not skip (wink)",
)

The idea is that validation becomes part of the interface itself, rather than an afterthought inside the application, making everything cleaner and simpler.

argparse --input

Another feature of argparse that often slides unnoticed is built-in FileType

It looks trivial: converts a string path into an already-open file object during the argument parsing, but that removes so much boilerplate in everyday CLI tools and is so easy to add, I am genuienly surprised why it is so unpopular.

While perfect for simple examples, I've started to think why there is no equivalent for directories?

argparse --path

Once you start thinking about file inputs, it becomes clear that FileType only solves a very limited set of problems. And I am not the only one who saw this! In fact, there was already a proposal to extend this idea further in the standard library:

stdlib-sig discussion: PathType proposal

The idea behind the patch is very simple - extend the functionality on directories, symlinks, and even the standard input. I found it a very interesting idea with a slightly aged implementation. pathlib instead of os would look here much better.

from argparse import ArgumentTypeError
from pathlib import Path
from typing import Literal
import argparse 
Kind = Literal["file", "directory", "symlink"]
class PathType:
    def __init__(
        self,
        must_exist: bool | None = True,
        kind: Kind | tuple[Kind] | None = None,
        allow_stdio: bool = False,
        resolve: bool = False,
    ):
        """
        Argparse type for validating filesystem path arguments.

        Converts a string input into a pathlib.Path object or stdin sentinel.
        Supports existens checks, path kind filtering, and path resolution.

        Args:
            must_exist (bool | None): Controls existence validation.
                True requires the path to exist.
                False requires the path not to exist.
                None disables existence checking.

            kind(Kind | tuple[Kind] | None): Allowed path types. Valid values are "file",
                "directory", and "symlink". If None, no type filtering is applied.

            allow_stdio (bool): If True, "-" is treated as stdin and returned
                unchanged.

            resolve (bool): If True, resolves the path before validation.

        Returns:
            Path | str: Validated filesystem path or stdin sentinel "-".

        Raises:
            ArgumentTypeError: If validation fails.

        """
        self.must_exist = must_exist
        self.allow_stdio = allow_stdio
        self.resolve = resolve

        if isinstance(kind, str):
            kind = (kind,)
        self.kind = frozenset(kind) if kind is not None else None

    def __call__(self, value: str) -> Path | str:
        if value == "-" and self.allow_stdio:
            return value

        path = Path(value)
        if self.resolve:
            path = path.resolve()


        if self.must_exist is True and not path.exists():
            raise ArgumentTypeError(f"Path does not exist: {path}")

        if self.must_exist is False and path.exists():
            raise ArgumentTypeError(f"Path does exist: {path}")


        if self.kind is not None and path.exists():
            if not (
                ("file" in self.kind and path.is_file())
                or ("directory" in self.kind and path.is_dir())
                or ("symlink" in self.kind and path.is_symlink())
            ):
                raise ArgumentTypeError(
                    f"Path does not match allowed kinds {self.kind}: {path}"
                )

        return path
A little copying is better than a little dependency. (Rob Pike)