Remote ML Detection Server
pyzm.serve is a built-in FastAPI server that loads ML models once
and serves detection requests over HTTP. This lets you offload GPU-heavy
inference to a dedicated machine.
URL mode (default) GPU box
+-----------------+ frame URLs +------------------+
| zm_detect.py | ----------------> | pyzm.serve |
| Detector( | | fetch from ZM |
| gateway=...) | <---------------- | detect & return |
+-----------------+ DetectionResult +------------------+
|
+------v-----------+
| ZoneMinder API |
+------------------+
Image mode (gateway_mode="image") GPU box
+-----------------+ HTTP/JPEG +------------------+
| zm_detect.py | ----------------> | pyzm.serve |
| Detector( | | YOLO11 (GPU) |
| gateway=..., | <---------------- | Coral TPU |
| gateway_mode= | DetectionResult +------------------+
| "image") |
+-----------------+
Two detection modes are available:
URL mode (default) – the client sends frame URLs to the
/detect_urlsendpoint and the server fetches images directly from ZoneMinder. This avoids transferring every frame through the client.Image mode – the client fetches frames from ZM, JPEG-encodes them, and uploads each one to the
/detectendpoint. Use this when the server cannot reach ZoneMinder directly.
URL mode (default) |
Image mode |
|
|---|---|---|
Network requirement |
Server must reach ZoneMinder |
Only client needs ZM access |
Bandwidth |
Low — client sends only URLs |
Higher — client uploads JPEG per frame |
Latency |
Server fetches from ZM (one extra hop) |
Single client → server transfer |
Security |
ZM credentials forwarded via |
Images leave ZM network |
Configuration |
|
|
Best for |
Same network / VPN between server and ZM |
Server on a different network or cloud |
When to choose Image mode:
Use Image mode when the GPU server cannot reach the ZoneMinder API
directly (e.g., server is in the cloud, or firewall rules prevent it).
The client handles frame fetching and uploads JPEG images to /detect.
When to stay with URL mode (default): Use URL mode when the server and ZoneMinder are on the same network. This minimises bandwidth on the client side and lets the server fetch only the frames it needs.
Deployment scenarios
Scenario 1: ZM + EventServerNg + hooks + pyzm (same box)
Everything runs on the same machine. The ZoneMinder EventServerNg (zmesNg)
triggers hook scripts which call zm_detect.py, and detection runs
locally via the Detector class.
ZoneMinder --> zmeventnotification.pl (zmesNg)
|
v
zm_event_start.sh
|
v
zm_detect.py --> Detector (local GPU/CPU)
objectconfig.yml (no remote section needed):
ml:
ml_sequence:
general:
model_sequence: "object"
object:
general:
pattern: "(person|car)"
sequence:
- name: YOLO11s
object_weights: "/var/lib/zmeventnotification/models/ultralytics/yolo11s.onnx"
object_labels: "/var/lib/zmeventnotification/models/yolov4/coco.names"
object_framework: opencv
object_processor: gpu
Test locally:
sudo -u www-data /opt/zoneminder/venv/bin/python /path/to/zm_detect.py \
--config /etc/zm/objectconfig.yml \
--eventid 12345
Scenario 2: ZM + hooks + pyzm (same box, no zmesNg)
Same as Scenario 1 but without EventServerNg. ZoneMinder calls
zm_detect.py directly via its EventStartCommand / EventEndCommand
recording settings.
ZoneMinder EventStartCommand --> zm_detect.py --> Detector (local)
ZoneMinder Console -> Click on Monitor Source -> Recording:
EventStartCommand = /opt/zoneminder/venv/bin/python /path/to/zm_detect.py -c /etc/zm/objectconfig.yml -e %EID% -m %MID% -r "%EC%" -n
objectconfig.yml is the same as Scenario 1.
Scenario 3: ZM box + remote GPU box (split architecture)
Detection runs on a separate GPU machine. The ZM box runs zm_detect.py
which sends requests to the remote pyzm.serve server over HTTP.
ZM box GPU box
+-------------------+ +------------------------+
| zm_detect.py | HTTP | pyzm.serve |
| Detector( | ------------> | --models all |
| gateway=...) | | --processor gpu |
| | <------------ | --port 5000 |
+-------------------+ JSON result +------------------------+
GPU box setup:
pip install pyzm[serve]
python -m pyzm.serve --models all --processor gpu --port 5000
Or with specific models and auth:
python -m pyzm.serve \
--models yolo11s yolo26s \
--processor gpu \
--port 5000 \
--auth --auth-user admin --auth-password secret \
--token-secret my-jwt-secret
ZM box objectconfig.yml:
ml:
ml_sequence:
general:
model_sequence: "object"
ml_gateway: "http://192.168.1.100:5000"
# ml_gateway_mode: "image" # uncomment if server can't reach ZM directly
ml_fallback_local: "yes"
object:
general:
pattern: "(person|car)"
sequence:
- object_framework: opencv
object_weights: "/var/lib/zmeventnotification/models/ultralytics/yolo11s.onnx"
By default, URL mode is used – the server fetches frames directly from ZM.
Set ml_gateway_mode: "image" if the server cannot reach ZoneMinder
(the client will JPEG-encode and upload frames instead).
Available models
Model names and discovery
Model names passed via --models (or Detector(models=[...])) are
resolved against --base-path on disk. There are no hardcoded presets –
any name you pass is looked up as follows:
Directory match –
base_path/<name>/containing a weight fileFile stem match – any
<name>.onnx,<name>.weights, or<name>.tflitein any subdirectory ofbase_path
The framework is inferred from the file extension:
.onnx– OpenCV DNN (ONNX runtime).weights– OpenCV DNN (Darknet format, also needs a.cfgfile).tflite– Coral Edge TPU runtime (processor forced totpu)
Label files are auto-detected from the same directory (.names,
.txt, .labels). For Darknet models, .cfg files are also
discovered automatically.
The --processor flag (cpu, gpu, tpu) applies to all
discovered models (except .tflite which always uses tpu).
--models all (lazy loading)
When you pass --models all to the server, every model in
--base-path is discovered and registered, but weights are not
loaded into memory at startup. Instead, each backend loads its weights
on the first request that uses it.
This is useful when you have many models but don’t want to consume GPU memory for all of them upfront.
python -m pyzm.serve --models all --base-path /data/models --processor gpu
Use the GET /models endpoint to check which models are available
and whether their weights have been loaded.
Model directory layout
/var/lib/zmeventnotification/models/
+-- yolov4/
| +-- yolov4.weights
| +-- yolov4.cfg
| +-- coco.names
+-- ultralytics/
| +-- yolo11s.onnx
| +-- yolo11n.onnx
| +-- yolo26s.onnx
+-- coral_edgetpu/
| +-- ssd_mobilenet_v2.tflite
| +-- coco_labels.txt
Server setup
Installation
The [serve] extra automatically includes all ML dependencies.
pip install "pyzm[serve]"
CLI options
Flag |
Default |
Description |
|---|---|---|
|
|
Bind address |
|
|
Bind port |
|
|
Model names (space-separated). Use |
|
|
Directory containing model subdirectories |
|
|
|
|
off |
Enable JWT authentication |
|
|
Username (when auth enabled) |
|
(empty) |
Password (when auth enabled) |
|
|
Secret key used to sign JWT tokens. Change this in production. |
|
off |
Enable debug logging for both pyzm and uvicorn |
|
(none) |
Path to a YAML config file ( |
YAML config file
Instead of CLI flags, you can provide a YAML config file via --config:
python -m pyzm.serve --config /etc/pyzm/serve.yml
Example serve.yml:
host: "0.0.0.0"
port: 5000
models:
- yolo11s
- yolo26s
base_path: "/var/lib/zmeventnotification/models"
processor: gpu
auth_enabled: true
auth_username: admin
auth_password: "my-secret-password"
token_secret: "a-strong-random-secret"
token_expiry_seconds: 3600
All fields correspond to ServerConfig attributes. When using
--config, CLI flags are ignored.
Client usage
Using the Detector API:
from pyzm import Detector
# URL mode (default) -- server fetches frames from ZM
detector = Detector(models=["yolo11s"], gateway="http://gpu-box:5000")
# Image mode -- client uploads JPEG-encoded frames
# detector = Detector(models=["yolo11s"], gateway="http://gpu-box:5000",
# gateway_mode="image")
# With authentication:
# detector = Detector(models=["yolo11s"], gateway="http://gpu-box:5000",
# gateway_username="admin", gateway_password="secret")
# detect() always uploads the image (single-image mode)
result = detector.detect("/path/to/image.jpg")
print(result.summary)
# detect_event() uses URL mode by default -- sends frame URLs,
# server fetches them directly from ZM
result = detector.detect_event(zm_client, event_id=12345,
stream_config=stream_cfg)
URL mode only applies to detect_event() calls. Single-image
detect() calls always upload the image regardless of this setting.
Using from_dict()
The ml_gateway key in the general section of ml_options
automatically enables remote mode:
ml_options = {
"general": {
"model_sequence": "object",
"ml_gateway": "http://gpu-box:5000",
# "ml_gateway_mode": "image", # uncomment if server can't reach ZM
# "ml_user": "admin",
# "ml_password": "secret",
},
"object": {
"general": {"pattern": ".*"},
"sequence": [...],
},
}
detector = Detector.from_dict(ml_options)
result = detector.detect(image)
Authentication
When the server is started with --auth, clients must first obtain a
JWT token via /login, then pass it as a Bearer token on subsequent
requests. The Detector gateway mode handles this automatically.
The --token-secret flag controls the secret key used to sign JWT
tokens. Always set this to a strong random value in production.
The default (change-me) is insecure.
Manual flow:
# Login
TOKEN=$(curl -s -X POST http://gpu-box:5000/login \
-H 'Content-Type: application/json' \
-d '{"username":"admin","password":"secret"}' \
| jq -r .access_token)
# Detect
curl -X POST http://gpu-box:5000/detect \
-H "Authorization: Bearer $TOKEN" \
-F file=@/path/to/image.jpg
Tokens expire after token_expiry_seconds (default 3600), configurable
via the YAML config file.
API reference
GET /health
Health check. Returns:
{"status": "ok", "models_loaded": true}
GET /models
Returns the list of available models and their load status. Useful with
--models all to check which backends have been lazily loaded.
{
"models": [
{"name": "yolo11s", "type": "object", "framework": "opencv", "loaded": true},
{"name": "yolo26s", "type": "object", "framework": "opencv", "loaded": false}
]
}
POST /detect
Run detection on an uploaded image (image mode).
Content-Type:
multipart/form-dataParameters: -
file(required) – JPEG/PNG image -zones(optional) – JSON string of zone listAuth: Bearer token (when auth enabled)
Returns:
DetectionResultas JSON (image field excluded)
POST /detect_urls
Run detection on images fetched from URLs (URL mode).
Content-Type:
application/jsonBody:
{ "urls": [ {"frame_id": "snapshot", "url": "https://zm.example.com/zm/index.php?view=image&eid=123&fid=snapshot"}, {"frame_id": "1", "url": "https://zm.example.com/zm/index.php?view=image&eid=123&fid=1"} ], "zm_auth": "token=abc123...", "zones": [{"name": "driveway", "value": [[0,0],[100,0],[100,100],[0,100]]}], "verify_ssl": false }
Auth: Bearer token (when auth enabled)
Returns:
DetectionResultas JSON (best frame selected byframe_strategy)
The server appends zm_auth to each URL and fetches the image via HTTP
GET (10-second timeout per URL; failures are logged and skipped).
Frame strategy (first, first_new, most, most_unique,
most_models) is applied server-side to pick the best result.
Note
The zones field accepts both "value" and "points" as the key
for polygon coordinates (they are interchangeable in both /detect
and /detect_urls).
POST /login
Obtain a JWT token. This endpoint is always registered, even when
--auth is not enabled (so clients with pre-configured credentials
don’t get a 404). When auth is disabled, any credentials are accepted.
Content-Type:
application/jsonBody:
{"username": "...", "password": "..."}Returns:
{"access_token": "...", "expires": 3600}
objectconfig.yml remote section
In a ZoneMinder event notification setup, configure the remote gateway
in objectconfig.yml:
ml_sequence:
general:
model_sequence: "object"
ml_gateway: "http://gpu-box:5000"
# ml_gateway_mode: "image" # uncomment if server can't reach ZM
ml_user: "admin"
ml_password: "secret"
ml_fallback_local: yes
object:
general:
pattern: "(person|car)"
sequence:
- object_framework: opencv
object_weights: /path/to/yolo11s.onnx
object_labels: /path/to/coco.names
When ml_gateway is set, detection requests are sent to the remote
server. URL mode is the default – the server fetches frames directly
from ZM. Set ml_gateway_mode: "image" if the server cannot reach
ZoneMinder. If ml_fallback_local is yes and the remote server
is unreachable, detection falls back to local inference.