Testing¶
This page covers both the automated performance benchmarking that runs in CI and the helper scripts for manual testing.
Performance Testing¶
Automated CI Benchmarks¶
Every push to main and every pull request triggers the Benchmark workflow (.github/workflows/benchmark.yml).
What the workflow does
- Builds the application image from the current commit using Jib
- Creates a k3d cluster with Numaflow, JetStream ISB, and a standalone Pulsar instance
- Pre-fills a Pulsar topic with ~4,000,000+ messages via a generator-based producer pipeline
- Deploys the consumer MonoVertex and captures metrics over a 90-second window
- Reports results:
- Push to
main→ stores on thegh-pagesbranch - Pull request → posts a sticky comment comparing against
mainbaseline
- Push to
Performance Charts¶
Benchmark results from every push to main are published using github-action-benchmark, producing an interactive chart that tracks performance over time.
The chart tracks:
| Metric | Description |
|---|---|
| Throughput | Messages per second |
| Latency | End-to-end processing time |
| Resource utilization | CPU and memory |
An alert threshold of 200% is configured — if a metric degrades by more than 2x, it will be flagged.
PR Benchmark Comments¶
On pull requests, the workflow posts a sticky comment comparing the PR's metrics against the latest main baseline. Reviewers can see the performance impact before merging.
Manual Trigger¶
The benchmark workflow supports workflow_dispatch with configurable parameters:
| Parameter | Default | Description |
|---|---|---|
measurement_duration |
90 |
Duration of each measurement round in seconds |
prefill_duration |
500 |
How long to run the producer before starting the consumer |
To trigger manually, go to the Actions tab and click "Run workflow".
Local Performance Testing¶
For more in-depth testing on your own cluster, see the performance testing runbook. The runbook is designed so that two builds of the consumer can be compared apples-to-apples on the same cluster against the same Pulsar backlog.
Baseline parameters (keep identical across runs)¶
When comparing consumer images, keep every variable fixed except the image tag. The sample manifests (monovertex_sample.yaml, producer_sample.yaml) are pre-configured with these values:
| Setting | Baseline | Where to set it |
|---|---|---|
| MonoVertex replicas | 1 |
spec.replicas and spec.scale.min/spec.scale.max in monovertex_sample.yaml |
| Read batch size | 500 |
spec.limits.readBatchSize in monovertex_sample.yaml |
| Container resources | 1500m CPU / 640Mi memory (requests == limits) |
spec.source.udsource.container.resources |
| Pulsar receiver queue size | 500 (must equal readBatchSize) |
pulsar.consumer.consumerConfig.receiverQueueSize in application.yml |
| Subscription initial position | Earliest |
pulsar.consumer.consumerConfig.subscriptionInitialPosition |
| Generator load | rpu: 10000, duration: 1s |
spec.vertices[0].source.generator in producer_sample.yaml |
| Pre-fill target | ~1,000,000+ messages in topic before starting consumer | Run the producer pipeline long enough |
Cluster setup checklist¶
- Image — Build
apache-pulsar-javaand tag it to matchspec.source.udsource.container.imagein the MonoVertex manifest. - ConfigMap (e.g.
consumer-config) — providesapplication.ymlmounted at/conf/application.yml. Must includepulsar.client,pulsar.consumer, andpulsar.adminsections. - Secret (e.g.
pulsar-secret-cloud) — holds Pulsar credentials referenced byenvFromin the MonoVertex. - Pulsar topic — same name in the consumer's
topicNamesand the producer pipeline. - Producer pipeline — deploy
producer_sample.yamlfirst and let it run until the topic backlog is ~1M+ messages. This keeps the consumer metrics window stable (ingest is slower than drain, so without pre-filling the topic empties fast). - Consumer MonoVertex — deploy
monovertex_sample.yamlonce the backlog is in place.
Metrics stack with numaflow-perfman¶
Perfman installs the Prometheus Operator, wires ServiceMonitors for pipeline and ISB metrics, and installs Grafana.
- Clone numaflow-perfman and
make build— binary lands atdist/perfman. - One-time install:
./dist/perfman setup -g. - Port-forward (each blocks, so use separate terminals):
- Prometheus UI:
./dist/perfman portforward -p→http://localhost:9090 - Grafana:
./dist/perfman portforward -g→http://localhost:3000(defaultadmin/admin) - Import the MonoVertex dashboard from this repo:
The command prints a link to open the dashboard in Grafana once the MonoVertex is running.
./dist/perfman dashboard --template-path development/performance-testing/dashboard-monovertex-template.json
The dashboard surfaces read batch size, end-to-end latency, and forwarder metrics scoped to the MonoVertex.
Running a comparison¶
To benchmark a new consumer image against a baseline:
- Build the new image with a distinct tag (e.g.
:experiment-foo). - Update
spec.source.udsource.container.imagein the MonoVertex manifest. kubectl apply -fthe MonoVertex. Leave every other baseline parameter untouched.- Let metrics stabilize in Grafana over the pre-filled backlog, then record the numbers.
- Swap the image tag back to the baseline and repeat for comparison.
AI-Assisted Performance Testing¶
This repo includes a Claude AI skill that can walk you through the entire local performance testing workflow interactively. It handles image builds, kubectl commands, perfman setup, and topic pre-filling — just provide your Pulsar cluster details and it runs everything for you.
To use it, open this repo in Cursor and ask the AI agent something like:
"Help me run a performance test for the consumer"
The agent will read the skill file, then guide you step by step — asking for your Pulsar service URL, topic name, API key, and image tag one at a time, and running all the commands on your behalf.
Script Testing¶
Helper scripts for manually testing the producer and consumer pipelines. These live in development/scripts/ in the repo.
Produce Messages¶
Location: development/scripts/produce-messages/
A bash script that sends distinguishable messages to one or more Pulsar topics via the REST API. Useful for testing the consumer pipeline.
Setup¶
Create a .env file in the script directory:
export PULSAR_REST_URL="https://pc-xxxx.streamnative.aws.snio.cloud"
export PULSAR_AUTH_TOKEN="<your-token>"
export PULSAR_TENANT="demo"
export PULSAR_NAMESPACE="dev"
export TOPICS="test-topic,test-topic-2"
Usage¶
From the repo root:
cd development/scripts/produce-messages
chmod +x produce-messages.sh
./produce-messages.sh
Send more messages or to different topics:
COUNT=10 ./produce-messages.sh
TOPICS="topic-a,topic-b" ./produce-messages.sh
Each message body includes the topic name and a timestamp (e.g. [test-topic] Message 1 at 2026-04-22T10:00:00Z), so you can tell which topic each message came from in the consumer logs.
Schema Validation Testing¶
Location: development/scripts/schema-validation/
Scripts for registering Avro schemas on Pulsar topics and publishing schema-encoded test messages. Useful for testing the useAutoConsumeSchema and useAutoProduceSchema features.
Prerequisites¶
- pulsarctl (
brew install pulsarctl) - Python 3 with
pip install avro - curl
Setup¶
cd development/scripts/schema-validation
cp .env.example .env
# Edit .env with your Pulsar details
Register a schema¶
./register-schema.sh schema-test-topic-avro.json
Publish valid messages¶
./publish-messages.sh schema-test-topic-avro.json
Test schema mismatch¶
Register one schema, then publish messages encoded with a different schema to test validation errors:
./register-schema.sh schema-test-topic-avro.json
./publish-messages.sh schema-test-topic-avro-name-only.json
This should cause a Java exception when the consumer tries to deserialize the mismatched messages.
Included schema files¶
| File | Description |
|---|---|
schema-test-topic.json |
JSON schema (TestMessage: name, topic) |
schema-test-topic-avro.json |
Avro schema (TestMessage: name, topic) |
schema-test-topic-avro-name-only.json |
Avro schema (single field: name) |