Skip to content
Unfurl Cloud
  • Dashboard
  • Blueprints
  • Cloud
Projects Groups Topics Snippets
  • Help
  • Register
  • Sign in
  • unfurl unfurl
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 12
    • Issues 12
    • List
    • Boards
    • Milestones
  • Merge requests 10
    • Merge requests 10
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar

Fix test_server.py for Python 3.14: robust IPv4 binding and error surfacing

  • Review changes

  • Download
  • Patches
  • Plain diff
Open Adam requested to merge copilot/fix-server-test-connection-error into testing Nov 29, 2025
  • Overview 13
  • Commits 5
  • Pipelines 0
  • Changes 1

Created by: Copilot

  • Add wait_for_status helper function near top of tests/test_server.py to poll until expected status code or timeout
  • Replace direct assertion assert res.status_code == 304 in test_server_export_remote for the application-blueprint test with call to wait_for_status
  • Also replace the direct assertion for the dashboard cache hit test (the loop at "cache hit for" case)
  • Add comment explaining the change is to make test robust against async cache population

Combined changes from both PR #352 (robust IPv4 binding and error surfacing) and PR #355 (wait_for_status helper for async cache timing).

Rebased cleanly against testing branch.

Changes from PR #352:

  • Explicit IPv4 binding: Added HOST = "127.0.0.1" constant to avoid getaddrinfo resolution differences across platforms
  • Error queue for child tracebacks: New serve_server wrapper forwards exceptions to a multiprocessing.Queue, enabling parent process to surface actual server crash reasons
  • Robust connection polling: Replaced urllib-based health check with socket.create_connection against both IPv4/IPv6, increased timeout from 2s to 12s
  • Updated all Process creation sites: Use get_context() and attach error queue to fixture, set_up_deployment, test_server_export_local, test_server_export_remote
  • Updated all request URLs: Use HOST instead of hardcoded 127.0.0.1

Changes from PR #355:

  • wait_for_status helper: Polls HTTP endpoint until expected status code or timeout elapses
  • Async cache timing tolerance: Replaced direct assert res.status_code == 304 assertions with wait_for_status(expected=304, timeout=15.0) calls
Original prompt

Problem

The py3.14 CI job is failing with ConnectionRefusedError in tests/test_server.py::test_server_version because the test helper that starts the server (start_server_process) is fragile: it polls only briefly, uses urllib against "localhost" (subject to IPv6/IPv4 resolution differences), and does not surface server-side exceptions when the child process exits. That hides the root cause when the server process crashes or binds to a different address family under Python 3.14.

Goal

Create a PR that makes tests/test_server.py robust so the test harness:

  • consistently binds/connects to IPv4 (127.0.0.1) by default,
  • uses a robust start_server_process that polls for a TCP listener across families and longer timeout,
  • captures and surfaces server-side Python tracebacks from the child process via a multiprocessing.Queue so failing starts show a clear error in test output,
  • uses multiprocessing.get_context() to create processes and attaches the error queue to the Process object so the waiter can inspect it.

Files to change

  • tests/test_server.py (ref ee68f046)

Detailed changes to apply

  1. Add imports near the top (or reuse existing ones) and define a HOST constant:
import traceback
import socket
from multiprocessing import get_context, Queue

# Prefer explicit IPv4 loopback for tests to avoid getaddrinfo resolution ordering differences
HOST = "127.0.0.1"
  1. Replace the current serve_server wrapper with one that forwards child start errors to a Queue:
def serve_server(*args, error_queue: Queue = None, **kw):
    try:
        return server.serve(*args, **kw)
    except Exception:
        tb = traceback.format_exc()
        if error_queue is not None:
            error_queue.put(tb)
        logging.warning("server.serve unexpectedly failed", exc_info=True)
        raise
  1. Replace start_server_process with a robust implementation that:
  • starts the Process (assumes it was created with kwargs {"error_queue": error_queue}),
  • polls socket.create_connection against HOST and '::1' for up to timeout,
  • if the child exits early, reads the queued traceback and raises RuntimeError with the traceback.
def start_server_process(process_obj, port, hosts=(HOST, "::1"), timeout=12.0):
    process_obj.start()
    start = time.time()
    last_exc = None

    def _child_traceback():
        eq = getattr(process_obj, "_error_queue", None)
        if not eq:
            return None
        try:
            return eq.get_nowait()
        except Exception:
            return None

    while time.time() - start < timeout:
        if not process_obj.is_alive():
            tb = _child_traceback()
            if tb:
                raise RuntimeError(f"server process exited prematurely; traceback:\n{tb}")
            else:
                raise RuntimeError(f"server process exited prematurely with exitcode {process_obj.exitcode}")
        for h in hosts:
            try:
                with socket.create_connection((h, port), timeout=1):
                    return process_obj
            except Exception as e:
                last_exc = e
        time.sleep(0.1)

    tb = _child_traceback()
    if tb:
        raise RuntimeError(f"server not reachable on port {port} after {timeout}s; server traceback:\n{tb}")
    raise RuntimeError(f"server not reachable on port {port} after {timeout}s; last error: {last_exc}")
  1. Replace Process creation sites in tests/test_server.py to use get_context() and pass an error_queue, attach the queue to the Process object. Also pass HOST instead of literal 'localhost' in args. Examples (apply to all occurrences):

Before (existing pattern):

server_process = Process(
    target=server.serve,
    args=("localhost", _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER),
)

After:

ctx = get_context()
error_queue = Queue()
server_process = ctx.Process(
    target=serve_server,
    args=(HOST, _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER),
    kwargs={"error_queue": error_queue},
)
server_process._error_queue = error_queue

Do the same in set_up_deployment, test_server_export_local, test_server_export_remote, and other places where Process is used to start server.serve.

  1. Ensure all client request URLs use HOST rather than hard-coded 'localhost'. For example:
res = requests.get(f"http://{HOST}:{_static_server_port}/health", params={"secret": "secret"})
  1. (Optional) Add a small try/except around waitress.serve in unfurl/server/serve.py to make the server log any startup exception clearly. This is optional; the test-side error queue should be sufficient to surface tracebacks.

Why this change

  • The error_queue surfaces the actual server traceback when the child dies on startup under Python 3.14 (so CI failure becomes actionable).
  • Using HOST = 127.0.0.1 avoids getaddrinfo ordering differences between IPv4/IPv6.
  • socket.create_connection is a more d...

This pull request was created as a result of the following prompt from Copilot chat.

Problem

The py3.14 CI job is failing with ConnectionRefusedError in tests/test_server.py::test_server_version because the test helper that starts the server (start_server_process) is fragile: it polls only briefly, uses urllib against "localhost" (subject to IPv6/IPv4 resolution differences), and does not surface server-side exceptions when the child process exits. That hides the root cause when the server process crashes or binds to a different address family under Python 3.14.

Goal

Create a PR that makes tests/test_server.py robust so the test harness:

  • consistently binds/connects to IPv4 (127.0.0.1) by default,
  • uses a robust start_server_process that polls for a TCP listener across families and longer timeout,
  • captures and surfaces server-side Python tracebacks from the child process via a multiprocessing.Queue so failing starts show a clear error in test output,
  • uses multiprocessing.get_context() to create processes and attaches the error queue to the Process object so the waiter can inspect it.

Files to change

  • tests/test_server.py (ref ee68f046)

Detailed changes to apply

  1. Add imports near the top (or reuse existing ones) and define a HOST constant:
import traceback
import socket
from multiprocessing import get_context, Queue

# Prefer explicit IPv4 loopback for tests to avoid getaddrinfo resolution ordering differences
HOST = "127.0.0.1"
  1. Replace the current serve_server wrapper with one that forwards child start errors to a Queue:
def serve_server(*args, error_queue: Queue = None, **kw):
    try:
        return server.serve(*args, **kw)
    except Exception:
        tb = traceback.format_exc()
        if error_queue is not None:
            error_queue.put(tb)
        logging.warning("server.serve unexpectedly failed", exc_info=True)
        raise
  1. Replace start_server_process with a robust implementation that:
  • starts the Process (assumes it was created with kwargs {"error_queue": error_queue}),
  • polls socket.create_connection against HOST and '::1' for up to timeout,
  • if the child exits early, reads the queued traceback and raises RuntimeError with the traceback.
def start_server_process(process_obj, port, hosts=(HOST, "::1"), timeout=12.0):
    process_obj.start()
    start = time.time()
    last_exc = None

    def _child_traceback():
        eq = getattr(process_obj, "_error_queue", None)
        if not eq:
            return None
        try:
            return eq.get_nowait()
        except Exception:
            return None

    while time.time() - start < timeout:
        if not process_obj.is_alive():
            tb = _child_traceback()
            if tb:
                raise RuntimeError(f"server process exited prematurely; traceback:\n{tb}")
            else:
                raise RuntimeError(f"server process exited prematurely with exitcode {process_obj.exitcode}")
        for h in hosts:
            try:
                with socket.create_connection((h, port), timeout=1):
                    return process_obj
            except Exception as e:
                last_exc = e
        time.sleep(0.1)

    tb = _child_traceback()
    if tb:
        raise RuntimeError(f"server not reachable on port {port} after {timeout}s; server traceback:\n{tb}")
    raise RuntimeError(f"server not reachable on port {port} after {timeout}s; last error: {last_exc}")
  1. Replace Process creation sites in tests/test_server.py to use get_context() and pass an error_queue, attach the queue to the Process object. Also pass HOST instead of literal 'localhost' in args. Examples (apply to all occurrences):

Before (existing pattern):

server_process = Process(
    target=server.serve,
    args=("localhost", _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER),
)

After:

ctx = get_context()
error_queue = Queue()
server_process = ctx.Process(
    target=serve_server,
    args=(HOST, _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER),
    kwargs={"error_queue": error_queue},
)
server_process._error_queue = error_queue

Do the same in set_up_deployment, test_server_export_local, test_server_export_remote, and other places where Process is used to start server.serve.

  1. Ensure all client request URLs use HOST rather than hard-coded 'localhost'. For example:
res = requests.get(f"http://{HOST}:{_static_server_port}/health", params={"secret": "secret"})
  1. (Optional) Add a small try/except around waitress.serve in unfurl/server/serve.py to make the server log any startup exception clearly. This is optional; the test-side error queue should be sufficient to surface tracebacks.

Why this change

  • The error_queue surfaces the actual server traceback when the child dies on startup under Python 3.14 (so CI failure becomes actionable).
  • Using HOST = 127.0.0.1 avoids getaddrinfo ordering differences between IPv4/IPv6.
  • socket.create_connection is a more direct liveness check than urllib.request.urlopen and avoids name resolution differences.
  • Increasing the timeout and polling avoids races where the parent attempts to connect too quickly.

Testing

  • Run the py3.14 tox job locally or in CI after the PR is merged; failing tests should now include a clear server traceback if the server is crashing or should pass if it was only a race/address-family issue.

Please create a PR that updates tests/test_server.py with the changes described above and use ref ee68f046 as the base for modifications.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: copilot/fix-server-test-connection-error