pytestで並列処理したいときの色々

Pythonのテストツールpytestで処理を並列に処理したいときの話。

pytest-xdist
pytest-xdistでのsession scope fixtureの管理
pytest-parallel
各種状態でのPID, PPID
pytest-parallelでのsession scope fixtureの管理
pytest-benchmarkとの関連
まとめ

pytest-xdist

pytestはPythonのテストを行うフレームワークですが、デフォルトではテストを一個ずつ行います。

並列で動かすためには別途プラグインを入れる必要があり、pytestのチームが作っているものとして pytest-xdist というプラグインがあります。

これをインストールすると-n(--numprocesses)というオプションが使えるようになりこれで指定した数だけ使うマルチプロセスでテストを行うようになります。

-n autoとすれば環境で使えるCPUの数だけプロセスを作ります。

設定ファイルなどでこの設定をしてしまっている場合に通常のシングルプロセスにしたい場合には -n 0とゼロを指定します。

-n 1も同じようにシングルプロセスになりますが、実際にはpytestのプロセスの下に子プロセスを1個作って実行するようになるので-n 0とは厳密には違うものになります。

pytest-xdistでのsession scope fixtureの管理

xdistではマルチプロセスとして各テストを行う際、指定された数のプロセスを作りそれぞれでセッションを行うような形でテストを実行します。

つまり、pytestにおけるsession scope fixtureを作った際、プロセスの分だけそのfixtureが実行されます。(autouse=Trueとするか各プロセス内のいずれかのテストがそれを呼んだとするか、各プロセスに最低一個そのfixtureを呼ぶテストが含まれるとして。)

以下のようなテストファイルを考えます。

setupというautouse=True(各テストから呼ばれなくても必ず実行される)のfixtureを持ち、 4つのテストを持つテストファイル。 (何一つテストしてませんがfixtureを見るためだけのものとして。)

test_session_fixture.py

import sys
import pytest



@pytest.fixture(scope="session", autouse=True)
def setup(worker_id):
    sys.stdout = sys.stderr
    print("setup", worker_id)
    yield
    print("teardown", worker_id)



def test_1():
    print("test_1")


def test_2():
    print("test_2")


def test_3():
    print("test_3")

def test_4():
    print("test_3")

setupではworker_idというfixtureを引数に持っていますが、これは pytest-xdistをインストールすると使えるようになり、 pytest-xdistが無効な場合にはmasterという文字列を返し、有効になるとgw0, gw1などプロセスごとのIDを返します。

sys.stdout = sys.stderrとしてるのはxdistでマルチプロセスにすると-sで標準出力が出力されなくなるため ¹。

また、print出力でpytestの標準出力と簡単に分けて分かりやすくもなります。

通常通り実行してみると、

$ pytest -s ./test_session_fixture.py >/dev/null
setup master
test_1
test_2
test_3
test_3
teardown master

こんな感じでsetupは全体で一回だけ呼ばれていることが分かります。

これを2つのプロセスでやってみると

$ pytest -n 2 -s ./test_session_fixture.py >/dev/null
setup gw1
test_3
setup gw0
test_1
test_3
teardown gw1
test_2
teardown gw0

今度はworker_idがgw0, gw1というものが出来ていて、それぞれで一回ずつsetupが呼ばれているのが分かります。

session scopeなfixtureは単に一回やってその情報をプロセスに渡すのに何度も生成する無駄を省くこともありますが、外部データなどの準備で複数回実行されると困るものもあるかと思いますが。

そのような際にはこのままだと困りますが、そういった場合には setupを以下のように書き換えて挙げればOK。

test_session_fixture_lock.py

import sys
import pytest
from filelock import FileLock


@pytest.fixture(scope="session", autouse=True)
def setup(worker_id, tmp_path_factory):
    sys.stdout = sys.stderr
    print("setup", worker_id)
    if worker_id == "master":
        yield
        print("teardown", worker_id)
        return

    root_tmp_dir = tmp_path_factory.getbasetemp().parent
    fn = root_tmp_dir / "check"
    with FileLock(root_tmp_dir / "setup.lock"):
        if fn.is_file():
            print("setup was done", worker_id)
        else:
            print("first setup", worker_id)
            fn.touch()
    yield
    print("teardown", worker_id)

Making session-scoped fixtures execute only once

pytest-xdistが無効(worker_idがmaster)のときは全く同じ動作、有効なときは checkという共通ファイルがあるかないかで動作を変更して、最初のプロセスでそのファイルを作り、更にfilelock(pytest-xdistの依存するライブラリになってるのでpytest-xdistをインストールすればインストール済)を使ってこの準備部分を排他的に処理しています。

これによって

$ pytest -n2  -s a.py  >/dev/null
setup gw1
setup gw0
first setup gw1
test_3
test_3
teardown gw1
setup was done gw0
test_1
test_2
teardown gw0

こんな感じでgw1ではfirst setupが実行され、gw0ではsetup was doneが実行されます。

-n 4とかにするとsetup was doneのプロセスが増えます。

メモリ上で直接渡す簡単な方法は用意されてないので、もしなにかデータを生成して共通のものを使いたいなら最初のプロセスでfnに書き込んで後のプロセスではここから読み出す、みたいな方法が考えられます。

共通のファイルを作るために

root_tmp_dir = tmp_path_factory.getbasetemp().parent

のようにディレクトリを取得しています。

tmp_path_factoryはpytestにもともとあるfixtureでgetbasetemp()でそのプロセスで使う一時ディレクトリを取得できます。

通常シングルプロセスでは$TMPDIR/pytest-of-$USER/pytest-<N>のようなディレクトリで、 Nを実行する毎にインクリメントして毎回別のディレクトリを使います。

これがxdistでマルチプロセスにすると、各プロセスごとに $TMPDIR/pytest-of-$USER/pytest-<N>/popen-gw0 のようにもう一段階掘ったディレクトリが割り当てられます。

ここでpytest-Nまでの部分は共通なのでgetbasetemp().parentによってこの共通ディレクトリを取得してそこを使ってデータファイルを作ったりロックファイルを置いたりすることができます。

pytest-parallel

pytest-xdistと似たようなプラグインでpytest-parallelというものもあります。

こちらは--workers 4といった感じでプロセス数(ワーカーノード数)を変更することが可能です。

ちょっと注意が必要なのが普通にpytest-parallelだけを入れて実行しようとするとpyというライブラリが無くて

INTERNALERROR> AttributeError: module 'py' has no attribute 'log'

といったエラーを吐きます。

pytest-benchmark: AttributeError: module ‘py’ has no attribute ‘io’ · Issue #10420 · pytest-dev/pytest

上のIssueではpytest-benchmarkを一緒に使おうとすると問題が出るとありますが、 python3.7以上だとpytest-benchmarkなしでも同様のエラーが出ます。

これはpip install pyと別途入れてあげれば回避出来ます。

pytest-parallelではpytest-xdistと違ってsys.stdoutも-sで出力されます。

また、一番大きな違いは--tests-per-worker 2とtests-per-workerを設定するとその数分だけ同じプロセス内でマルチスレッドでテストを行うようになります。これによってテストで実行されるジョブがスレッドセーフなのかどうか、などを検証することが出来ます。

一方でマルチプロセスでもマルチスレッドでも各テストがすべて独立のセッションとして実行されるようになり、 autouse=Trueなsession scope fixtureがあると全てのテストで実行されます。

またxdistのworker_idのようなfixtureもないのでマルチプロセス/マルチスレッドかどうか、を判断することはfixtureなどを使っては出来ません ²。

またtmp_path_factoryは各テスト毎に $TMPDIR/pytest-of-$USER/pytest-<N> というディレクトリを作る形になり、 xdistのときのようにparentを取ると毎回同じディレクトリになってしまいます。

のでちょっと別の工夫をする必要があります。

各種状態でのPID, PPID

pytestを走らせたときに全体で共通のものを取得するために pytest-parallelではpytest自体のPIDを使うことが考えられます。

PIDをチェックするために以下のようなスクリプトを用意します。

注意として、今、環境に、pytest, pytest-xdist, pytest-parallelの全てがインストールされてるとします。

pytest-xdistが入ってないとworker_idというfixtureが存在しないのでエラーになってしまいます。

test_pid.py

import os
import sys
import pytest


@pytest.fixture(scope="session", autouse=True)
def setup(tmp_path_factory, worker_id):
    sys.stdout = sys.stderr
    print(tmp_path_factory.getbasetemp(), worker_id)
    print("pid", os.getpid(), worker_id)
    print("ppid", os.getppid(), worker_id)


def test_1():
    print("test_1")


def test_2():
    print("test_2")

def test_3():
    print("test_3")

def test_4():
    print("test_4")

まずは通常のシングルプロセス。PPIDのチェックのため2回走らせます。

$ pytest -s test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-360 master
pid 10292 master
ppid 16116 master
test_1
test_2
test_3
test_4
(p3.10) (-_-) $ pytest -s test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-361 master
pid 10317 master
ppid 16116 master
test_1
test_2
test_3
test_4

ppidは同じになっていることがわかりますが、これはこのコマンドを実行しているシェルのPIDです。 pytest自体のPIDはos.getpid()で取った最初の値です。

また、tmp_path_factory.getbasetemp()は1つ数字が大きいディレクトリに移っていることが分かります。

次はpytest-xdist:

$ pytest -s -n 2 test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-362/popen-gw1 gw1
pid 10561 gw1
ppid 10557 gw1
/tmp/pytest-of-user/pytest-362/popen-gw0 gw0
pid 10558 gw0
ppid 10557 gw0
test_3
test_1
test_4
test_2
$ pytest -s -n 2 test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-363/popen-gw1 gw1
pid 10598 gw1
ppid 10594 gw1
/tmp/pytest-of-user/pytest-363/popen-gw0 gw0
pid 10595 gw0
ppid 10594 gw0
test_3
test_1
test_4
test_2

今度は2回setupが呼ばれてますが、gw1, gw0共に同じppidの値を持ってることが分かります。ただし、別のpytest実行時にはこの値は変わっています。

これはテストを実行しているプロセス(pidのプロセス)がpytestのプロセスの子プロセスになっていて、 4870や5004のppidが示すプロセスがpytestのプロセスになっているためです。

ここで試しに-n 0と-n 1を試してみると、

-n 0は

$ pytest -s -n 0 test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-364 master
pid 10748 master
ppid 16116 master
test_1
test_2
test_3
test_4
$ pytest -s -n 0 test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-365 master
pid 10773 master
ppid 16116 master
test_1
test_2
test_3
test_4

と、-nオプションを付けない通常の場合と同様ppidはシェルのPIDでpidがpytestのPIDになっています。また、一時ディレクトリもpytest-<N>です。

一方-n 1は

$ pytest -s -n 1 test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-366/popen-gw0 gw0
pid 11027 gw0
ppid 11026 gw0
test_1
test_2
test_3
test_4
$ pytest -s -n 1 test_pid.py >/dev/null
/tmp/pytest-of-user/pytest-367/popen-gw0 gw0
pid 11058 gw0
ppid 11057 gw0
test_1
test_2
test_3
test_4

こんな感じでppidの値も変わっていて、マルチプロセスの時と同様、pytestのプロセスが子プロセスを作っってpidにあたるプロセスの中でテストを実行していることが分かります。一時ディレクトリもpopen-gw0と1つ掘り下げた先になっています。

次に本題のpytest-parallelの場合。

$ pytest -s --workers 2 test_pid.py >/dev/null
...
/tmp/pytest-of-user/pytest-368 master
pid 11299 master
ppid 11284 master
test_1
/tmp/pytest-of-user/pytest-369 master
pid 11300 master
ppid 11284 master
/tmp/pytest-of-user/pytest-368 master
pid 11299 master
ppid 11284 master
test_3
/tmp/pytest-of-user/pytest-368 master
pid 11299 master
ppid 11284 master
test_2
test_4
$ pytest -s --workers 2 test_pid.py >/dev/null
...
/tmp/pytest-of-user/pytest-370 master
pid 11360 master
ppid 11345 master
/tmp/pytest-of-user/pytest-371 master
pid 11361 master
ppid 11345 master
test_1
test_2
/tmp/pytest-of-user/pytest-370 master
pid 11360 master
ppid 11345 master
/tmp/pytest-of-user/pytest-371 master
pid 11361 master
ppid 11345 master
test_3
test_4

まず、worker数2にしてますが、setupは4回呼び出されています。 ppidの値を見ると各pytestをお実行で6538, 6869とすべて同じものが4回呼び出されています。また、それぞれのppidが同じもの同士で同じ一時ディレクトリを使っていることが分かります。ただしそれらはpytest-<N>の通常のpytestで作られるものと同じ階層です。

一方でpidは最初の方だと6653と6652、2つ目だと6884, 6885の2種類があります。それぞれが同じプロセスのPIDです。 (4つのテストで2プロセスですが、一瞬で終わるテストでタイミング的に1:3に分かれてしまっている。)

ここでppidに注目すると同一のpytest内では共通で次のpytest実行時には別の数字になっているのでこれが共通のデータファイルやlockファイル用に使えそうです。

--workers 1としてもxdistのときと同様pytestのプロセスの下に子プロセスが生まれてその中でテストが実行されます。

ただし、pytest-parallelでは--workers 0としてしまうと子プロセスが作られずにずっと待ち状態になってしまうので使えません。

マルチスレッドの場合を見てみると

$ pytest -s --tests-per-worker 2 test_pid.py >/dev/null
...
/tmp/pytest-of-user/pytest-372 master
pid 11832 master
ppid 11817 master
/tmp/pytest-of-user/pytest-373 master
pid 11832 master
ppid 11817 master
test_1
test_2
/tmp/pytest-of-user/pytest-373 master
pid 11832 master
ppid 11817 master
/tmp/pytest-of-user/pytest-373 master
pid 11832 master
ppid 11817 master
test_3
test_4
$ pytest -s --tests-per-worker 2 test_pid.py >/dev/null
...
/tmp/pytest-of-user/pytest-374 master
pid 11885 master
ppid 11871 master
test_2
/tmp/pytest-of-user/pytest-375 master
pid 11885 master
ppid 11871 master
test_1
/tmp/pytest-of-user/pytest-375 master
pid 11885 master
ppid 11871 master
test_3
/tmp/pytest-of-user/pytest-375 master
pid 11885 master
ppid 11871 master
test_4

こんな感じで、この場合はworkerを指定せずにシングルプロセスなので pidも全て同じになります。ただし一時ディレクトリは各pytestで二種類あってそれぞれが別のスレッドに対応しているようです。また、ppidが2回めのpytest実行時には変わっていることからも各テストが実行されるプロセスは pytestの子プロセスとして実行されていることが分かります。

この場合でもppidを用いて共通ファイルを作れそうです。

pytest-parallelでのsession scope fixtureの管理

上のようにos.getppid()の値を用いてディレクトリを作り、そのにlockファイルなどを置くようにします。

test_parallel.py

import os
import sys
import pytest
from filelock import FileLock


@pytest.fixture(scope="session", autouse=True)
def setup(tmp_path_factory):
    sys.stdout = sys.stderr

    root_tmp_dir = tmp_path_factory.getbasetemp().parent / str(os.getppid())
    root_tmp_dir.mkdir(exist_ok=True)
    fn = root_tmp_dir / "check"
    print(fn)
    with FileLock(root_tmp_dir / "setup.lock"):
        if fn.is_file():
            print("setup was done")
        else:
            print("first setup")
            fn.touch()
    yield
    print("teardown")



def test_1():
    print("test_1")


def test_2():
    print("test_2")


def test_3():
    print("test_3")

def test_4():
    print("test_3")

これを実行すると

$ pytest -s --workers 2 ./test_parallel.py >/dev/null
...
/tmp/pytest-of-user/14629/check
first setup
/tmp/pytest-of-user/14629/check
setup was done
test_1
teardown
test_2
teardown
/tmp/pytest-of-user/14629/check
setup was done
test_3
teardown
/tmp/pytest-of-user/14629/check
setup was done
test_3
teardown

こんな感じでsetupが4回、テストの数だけ呼ばれていることがわかりますが、 first setupは一回だけになっています。

この場合lockファイルとかが置かれるのは $TMPDIR/pytest-of-$USER/<shell’s PPID>/のようなディレクトリになります。

このPPIDがpytest実行毎に毎回変わるのでxdistのところでやったようなことが出来ます。

ただし一時ディレクトリがあまり使われてない環境で繰り返しpytestを実行すると PPIDが被る可能性は十分あるので適度に手動でrm -rf $TMPDIR/pytest-of-$USERする必要はあるかもしれません。

手動でやっているのであれば.bashrcの中とかでシェル起動時にやるとか。

また、このテストファイルを--workersや--tests-per-workerの引数なしで実行すると今度はPPIDがこのコマンドを実行したシェルのPIDとかになって、それは毎回一緒になってしまいます。 (なので$TMPDIR/pytest-of-$USER/<shell’s PPID>/checkを消さないと次以降setupの準備が一切行われないことになってしまう。)

なのでこのようなスクリプトにしてしまったらシングルプロセスとして使いたい場合にも--workers 1 (もしくは--tests-per-worker 1)を与えてpytest-parallelを呼ばないといけません。

そもそもpytest-xdistとpytest-parallelを一緒に使うべきではありませんが、このスクリプトを-n 2とかで実行すると checkはpopen-gw0などのディレクトリの下に行くので今度は全てのプロセスでsetupが実行されてしまうことになります。

そんな感じでpytest-parallelでのsession scopeなど、関数より大きな単位でのfixtureを使おうと思うと pytest-xdist以上に面倒だったりします。

pytest-benchmarkとの関連

pytest-benchmark はpytestの中で関数などの実行速度のベンチマークを取ってくれるプラグインですが、 pytest-xdistやpytest-parallelが有効になっているとベンチマークテストが無効になります。

pytest-xdistなどを入れていると、通常のテスト時には何もせずにマルチプロセスでやって欲しいこともあると思いますが、そういった場合pytest.iniやproject.tomlでaddopts = "-n auto"みたいな設定を入れたりします。

そこでベンチマークを行おうとすると無効になってしまうので、ベンチマークを取る時だけ

$ pytest -n 0 tests/test_benchmark.py

みたいな感じでマルチプロセスを無効にすることでベンチマークを取ることができます。

ただ、pytest-parallelは上にも書いたように--workers 0と0を指定できないので、設定ファイル内で--workers autoみたいにpytest-parallelを有効にしてしまうと無効にする方法がありません。(--workers 1でもプロセスは1個になるけどpytest-parallel自体は有効になってしまってベンチマークが動かない。)

従ってpytest-benchmarkとpytest-parallelを同時に使いたい場合にはマルチプロセス(もしくはマルチスレッド)なオプションは設定ファイルに書いておくことは出来ません。