Fix flaky test_server_export_remote by polling for cache population (!355) · Merge requests · onecommons / unfurl

Open Adam requested to merge copilot/make-test-tolerant-to-cache-timing into testing Nov 29, 2025

Created by: Copilot

Add wait_for_status helper function near top of tests/test_server.py to poll until expected status code or timeout
Replace direct assertion assert res.status_code == 304 in test_server_export_remote for the application-blueprint test with call to wait_for_status
Also replace the direct assertion for the dashboard cache hit test (the loop at "cache hit for" case)
Add comment explaining the change is to make test robust against async cache population
Run code review and CodeQL checker
Address code review feedback (use requests.RequestException instead of bare Exception)

Original prompt

Problem:

A test in CI (tests/test_server.py::test_server_export_remote) intermittently fails under Python 3.14 because the test expects an immediate cached response (HTTP 304) after the server populates its cache, but the server populates its cache asynchronously, leading to races. Recent changes made server startup more robust but exposed this timing flake: the test currently does a single request with If-None-Match and asserts status_code == 304, but sometimes receives 200.

Goal:

Make the test tolerant to the asynchronous cache population by adding a small helper that polls the export endpoint for a short timeout until it observes the expected status code (304). On timeout, the helper should fail the test with diagnostic information including the last response headers to aid debugging.

Files to change:

tests/test_server.py (ref ee68f046)

Detailed changes to apply:

Add a helper function near the top of tests/test_server.py (with existing imports) called wait_for_status(url, params=None, headers=None, expected=304, timeout=10.0, poll_interval=0.25). The helper should:

poll requests.get(url, params=params, headers=headers, timeout=2.0) until response.status_code == expected or timeout elapses

on success, return the Response object

on timeout, call pytest.fail() with a message containing the last status code and last response headers if any, or indicate no successful response seen

Replace the direct assertion that expects 304 in test_server_export_remote with a call to wait_for_status(..., expected=304, timeout=15.0). Use the same URL, params, and headers currently used by the test.

Add a brief comment explaining this is to make the test robust against async cache population in CI.

Exact code to add (suggested implementation):
import time

def wait_for_status(url, params=None, headers=None, expected=304, timeout=10.0, poll_interval=0.25):
    """Poll `url` until it returns `expected` status or `timeout` elapses.

    On timeout, fail the test with diagnostic information including last response headers.
    """
    deadline = time.time() + timeout
    last_res = None
    while time.time() < deadline:
        try:
            last_res = requests.get(url, params=params, headers=headers, timeout=2.0)
        except Exception:
            last_res = None
            time.sleep(poll_interval)
            continue
        if last_res.status_code == expected:
            return last_res
        time.sleep(poll_interval)

    if last_res is None:
        pytest.fail(f"Timed out waiting for status {expected} from {url}: no successful response seen within {timeout}s")
    else:
        pytest.fail(
            f"cache expected {expected} for {url} after {timeout}s, last status {last_res.status_code}, headers: {dict(last_res.headers)}"
        )
And replace this block in test_server_export_remote:
res = requests.get(
    f"http://{HOST}:{port}/export",
    params={
        "auth_project": "onecommons/project-templates/application-blueprint",
        "latest_commit": last_commit,  # enable caching but just get the latest in the cache
        "format": "blueprint",
    },
    headers={
      "If-None-Match": etag,
    }
)
assert res.status_code == 304
with:
res = wait_for_status(
    f"http://{HOST}:{port}/export",
    params={
        "auth_project": "onecommons/project-templates/application-blueprint",
        "latest_commit": last_commit,  # enable caching but just get the latest in the cache
        "format": "blueprint",
    },
    headers={"If-None-Match": etag},
    expected=304,
    timeout=15.0,
)
Why this helps:

Avoids flakiness caused by asynchronous cache population and CI timing differences (exacerbated under Python 3.14).

Provides diagnostics (ETag and headers) on failure to aid further debugging.

Please create a PR that applies these changes against branch copilot/fix-server-test-connection-error (head of PR #352) targeting main branch using ref ee68f046 as context.

This pull request was created as a result of the following prompt from Copilot chat.

Problem:

A test in CI (tests/test_server.py::test_server_export_remote) intermittently fails under Python 3.14 because the test expects an immediate cached response (HTTP 304) after the server populates its cache, but the server populates its cache asynchronously, leading to races. Recent changes made server startup more robust but exposed this timing flake: the test currently does a single request with If-None-Match and asserts status_code == 304, but sometimes receives 200.

Goal:

Make the test tolerant to the asynchronous cache population by adding a small helper that polls the export endpoint for a short timeout until it observes the expected status code (304). On timeout, the helper should fail the test with diagnostic information including the last response headers to aid debugging.

Files to change:

tests/test_server.py (ref ee68f046)

Detailed changes to apply:

Add a helper function near the top of tests/test_server.py (with existing imports) called wait_for_status(url, params=None, headers=None, expected=304, timeout=10.0, poll_interval=0.25). The helper should:

poll requests.get(url, params=params, headers=headers, timeout=2.0) until response.status_code == expected or timeout elapses

on success, return the Response object

on timeout, call pytest.fail() with a message containing the last status code and last response headers if any, or indicate no successful response seen

Replace the direct assertion that expects 304 in test_server_export_remote with a call to wait_for_status(..., expected=304, timeout=15.0). Use the same URL, params, and headers currently used by the test.

Add a brief comment explaining this is to make the test robust against async cache population in CI.

Exact code to add (suggested implementation):
import time

def wait_for_status(url, params=None, headers=None, expected=304, timeout=10.0, poll_interval=0.25):
    """Poll `url` until it returns `expected` status or `timeout` elapses.

    On timeout, fail the test with diagnostic information including last response headers.
    """
    deadline = time.time() + timeout
    last_res = None
    while time.time() < deadline:
        try:
            last_res = requests.get(url, params=params, headers=headers, timeout=2.0)
        except Exception:
            last_res = None
            time.sleep(poll_interval)
            continue
        if last_res.status_code == expected:
            return last_res
        time.sleep(poll_interval)

    if last_res is None:
        pytest.fail(f"Timed out waiting for status {expected} from {url}: no successful response seen within {timeout}s")
    else:
        pytest.fail(
            f"cache expected {expected} for {url} after {timeout}s, last status {last_res.status_code}, headers: {dict(last_res.headers)}"
        )
And replace this block in test_server_export_remote:
res = requests.get(
    f"http://{HOST}:{port}/export",
    params={
        "auth_project": "onecommons/project-templates/application-blueprint",
        "latest_commit": last_commit,  # enable caching but just get the latest in the cache
        "format": "blueprint",
    },
    headers={
      "If-None-Match": etag,
    }
)
assert res.status_code == 304
with:
res = wait_for_status(
    f"http://{HOST}:{port}/export",
    params={
        "auth_project": "onecommons/project-templates/application-blueprint",
        "latest_commit": last_commit,  # enable caching but just get the latest in the cache
        "format": "blueprint",
    },
    headers={"If-None-Match": etag},
    expected=304,
    timeout=15.0,
)
Why this helps:

Avoids flakiness caused by asynchronous cache population and CI timing differences (exacerbated under Python 3.14).

Provides diagnostics (ETag and headers) on failure to aid further debugging.

Please create a PR that applies these changes against branch copilot/fix-server-test-connection-error (head of PR #352) targeting main branch using ref ee68f046 as context.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.