Created by: Copilot
-
Add wait_for_statushelper function near top oftests/test_server.pyto poll until expected status code or timeout -
Replace direct assertion assert res.status_code == 304intest_server_export_remotefor the application-blueprint test with call towait_for_status -
Also replace the direct assertion for the dashboard cache hit test (the loop at "cache hit for" case) -
Add comment explaining the change is to make test robust against async cache population
Combined changes from both PR #352 (robust IPv4 binding and error surfacing) and PR #355 (wait_for_status helper for async cache timing).
Rebased cleanly against testing branch.
Changes from PR #352:
-
Explicit IPv4 binding: Added
HOST = "127.0.0.1"constant to avoidgetaddrinforesolution differences across platforms -
Error queue for child tracebacks: New
serve_serverwrapper forwards exceptions to amultiprocessing.Queue, enabling parent process to surface actual server crash reasons -
Robust connection polling: Replaced urllib-based health check with
socket.create_connectionagainst both IPv4/IPv6, increased timeout from 2s to 12s -
Updated all Process creation sites: Use
get_context()and attach error queue to fixture,set_up_deployment,test_server_export_local,test_server_export_remote -
Updated all request URLs: Use
HOSTinstead of hardcoded127.0.0.1
Changes from PR #355:
- wait_for_status helper: Polls HTTP endpoint until expected status code or timeout elapses
-
Async cache timing tolerance: Replaced direct
assert res.status_code == 304assertions withwait_for_status(expected=304, timeout=15.0)calls
Original prompt
Problem
The py3.14 CI job is failing with ConnectionRefusedError in tests/test_server.py::test_server_version because the test helper that starts the server (start_server_process) is fragile: it polls only briefly, uses urllib against "localhost" (subject to IPv6/IPv4 resolution differences), and does not surface server-side exceptions when the child process exits. That hides the root cause when the server process crashes or binds to a different address family under Python 3.14.
Goal
Create a PR that makes tests/test_server.py robust so the test harness:
- consistently binds/connects to IPv4 (127.0.0.1) by default,
- uses a robust start_server_process that polls for a TCP listener across families and longer timeout,
- captures and surfaces server-side Python tracebacks from the child process via a multiprocessing.Queue so failing starts show a clear error in test output,
- uses multiprocessing.get_context() to create processes and attaches the error queue to the Process object so the waiter can inspect it.
Files to change
- tests/test_server.py (ref ee68f046)
Detailed changes to apply
- Add imports near the top (or reuse existing ones) and define a HOST constant:
import traceback
import socket
from multiprocessing import get_context, Queue
# Prefer explicit IPv4 loopback for tests to avoid getaddrinfo resolution ordering differences
HOST = "127.0.0.1"
- Replace the current serve_server wrapper with one that forwards child start errors to a Queue:
def serve_server(*args, error_queue: Queue = None, **kw):
try:
return server.serve(*args, **kw)
except Exception:
tb = traceback.format_exc()
if error_queue is not None:
error_queue.put(tb)
logging.warning("server.serve unexpectedly failed", exc_info=True)
raise
- Replace start_server_process with a robust implementation that:
- starts the Process (assumes it was created with kwargs {"error_queue": error_queue}),
- polls socket.create_connection against HOST and '::1' for up to timeout,
- if the child exits early, reads the queued traceback and raises RuntimeError with the traceback.
def start_server_process(process_obj, port, hosts=(HOST, "::1"), timeout=12.0):
process_obj.start()
start = time.time()
last_exc = None
def _child_traceback():
eq = getattr(process_obj, "_error_queue", None)
if not eq:
return None
try:
return eq.get_nowait()
except Exception:
return None
while time.time() - start < timeout:
if not process_obj.is_alive():
tb = _child_traceback()
if tb:
raise RuntimeError(f"server process exited prematurely; traceback:\n{tb}")
else:
raise RuntimeError(f"server process exited prematurely with exitcode {process_obj.exitcode}")
for h in hosts:
try:
with socket.create_connection((h, port), timeout=1):
return process_obj
except Exception as e:
last_exc = e
time.sleep(0.1)
tb = _child_traceback()
if tb:
raise RuntimeError(f"server not reachable on port {port} after {timeout}s; server traceback:\n{tb}")
raise RuntimeError(f"server not reachable on port {port} after {timeout}s; last error: {last_exc}")
- Replace Process creation sites in tests/test_server.py to use get_context() and pass an error_queue, attach the queue to the Process object. Also pass HOST instead of literal 'localhost' in args. Examples (apply to all occurrences):
Before (existing pattern):
server_process = Process(
target=server.serve,
args=("localhost", _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER),
)
After:
ctx = get_context()
error_queue = Queue()
server_process = ctx.Process(
target=serve_server,
args=(HOST, _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER),
kwargs={"error_queue": error_queue},
)
server_process._error_queue = error_queue
Do the same in set_up_deployment, test_server_export_local, test_server_export_remote, and other places where Process is used to start server.serve.
- Ensure all client request URLs use HOST rather than hard-coded 'localhost'. For example:
res = requests.get(f"http://{HOST}:{_static_server_port}/health", params={"secret": "secret"})
- (Optional) Add a small try/except around waitress.serve in unfurl/server/serve.py to make the server log any startup exception clearly. This is optional; the test-side error queue should be sufficient to surface tracebacks.
Why this change
- The error_queue surfaces the actual server traceback when the child dies on startup under Python 3.14 (so CI failure becomes actionable).
- Using HOST = 127.0.0.1 avoids getaddrinfo ordering differences between IPv4/IPv6.
- socket.create_connection is a more d...
This pull request was created as a result of the following prompt from Copilot chat.
Problem
The py3.14 CI job is failing with ConnectionRefusedError in tests/test_server.py::test_server_version because the test helper that starts the server (start_server_process) is fragile: it polls only briefly, uses urllib against "localhost" (subject to IPv6/IPv4 resolution differences), and does not surface server-side exceptions when the child process exits. That hides the root cause when the server process crashes or binds to a different address family under Python 3.14.
Goal
Create a PR that makes tests/test_server.py robust so the test harness:
- consistently binds/connects to IPv4 (127.0.0.1) by default,
- uses a robust start_server_process that polls for a TCP listener across families and longer timeout,
- captures and surfaces server-side Python tracebacks from the child process via a multiprocessing.Queue so failing starts show a clear error in test output,
- uses multiprocessing.get_context() to create processes and attaches the error queue to the Process object so the waiter can inspect it.
Files to change
- tests/test_server.py (ref ee68f046)
Detailed changes to apply
- Add imports near the top (or reuse existing ones) and define a HOST constant:
import traceback import socket from multiprocessing import get_context, Queue # Prefer explicit IPv4 loopback for tests to avoid getaddrinfo resolution ordering differences HOST = "127.0.0.1"
- Replace the current serve_server wrapper with one that forwards child start errors to a Queue:
def serve_server(*args, error_queue: Queue = None, **kw): try: return server.serve(*args, **kw) except Exception: tb = traceback.format_exc() if error_queue is not None: error_queue.put(tb) logging.warning("server.serve unexpectedly failed", exc_info=True) raise
- Replace start_server_process with a robust implementation that:
- starts the Process (assumes it was created with kwargs {"error_queue": error_queue}),
- polls socket.create_connection against HOST and '::1' for up to timeout,
- if the child exits early, reads the queued traceback and raises RuntimeError with the traceback.
def start_server_process(process_obj, port, hosts=(HOST, "::1"), timeout=12.0): process_obj.start() start = time.time() last_exc = None def _child_traceback(): eq = getattr(process_obj, "_error_queue", None) if not eq: return None try: return eq.get_nowait() except Exception: return None while time.time() - start < timeout: if not process_obj.is_alive(): tb = _child_traceback() if tb: raise RuntimeError(f"server process exited prematurely; traceback:\n{tb}") else: raise RuntimeError(f"server process exited prematurely with exitcode {process_obj.exitcode}") for h in hosts: try: with socket.create_connection((h, port), timeout=1): return process_obj except Exception as e: last_exc = e time.sleep(0.1) tb = _child_traceback() if tb: raise RuntimeError(f"server not reachable on port {port} after {timeout}s; server traceback:\n{tb}") raise RuntimeError(f"server not reachable on port {port} after {timeout}s; last error: {last_exc}")
- Replace Process creation sites in tests/test_server.py to use get_context() and pass an error_queue, attach the queue to the Process object. Also pass HOST instead of literal 'localhost' in args. Examples (apply to all occurrences):
Before (existing pattern):
server_process = Process( target=server.serve, args=("localhost", _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER), )After:
ctx = get_context() error_queue = Queue() server_process = ctx.Process( target=serve_server, args=(HOST, _static_server_port, "secret", ".", "", {}, CLOUD_TEST_SERVER), kwargs={"error_queue": error_queue}, ) server_process._error_queue = error_queueDo the same in set_up_deployment, test_server_export_local, test_server_export_remote, and other places where Process is used to start server.serve.
- Ensure all client request URLs use HOST rather than hard-coded 'localhost'. For example:
res = requests.get(f"http://{HOST}:{_static_server_port}/health", params={"secret": "secret"})
- (Optional) Add a small try/except around waitress.serve in unfurl/server/serve.py to make the server log any startup exception clearly. This is optional; the test-side error queue should be sufficient to surface tracebacks.
Why this change
- The error_queue surfaces the actual server traceback when the child dies on startup under Python 3.14 (so CI failure becomes actionable).
- Using HOST = 127.0.0.1 avoids getaddrinfo ordering differences between IPv4/IPv6.
- socket.create_connection is a more direct liveness check than urllib.request.urlopen and avoids name resolution differences.
- Increasing the timeout and polling avoids races where the parent attempts to connect too quickly.
Testing
- Run the py3.14 tox job locally or in CI after the PR is merged; failing tests should now include a clear server traceback if the server is crashing or should pass if it was only a race/address-family issue.
Please create a PR that updates tests/test_server.py with the changes described above and use ref ee68f046 as the base for modifications.