Skip to main content

Zyntem Fiscalization - Deployment Guide

Company: Zyntem Product: Fiscalization by Zyntem Version: 1.0 Last Updated: 2025-10-30 Status: Active


Overview

This document describes the deployment strategy for Fiscalization by Zyntem, including zero-downtime deployments, rollback procedures, and monitoring requirements.

Deployment Strategy: Blue-Green deployment with gradual traffic shifting

SLA Target: 99.5% uptime (NFR2)


Deployment Strategy: Blue-Green with Canary

How It Works

graph TD
A[New Code Deployed] --> B[Create New Revision - Green]
B --> C[Health Check Passes?]
C -->|No| D[Rollback - Delete Green]
C -->|Yes| E[Canary: 10% Traffic to Green]
E --> F[Monitor 5 Minutes]
F --> G{Metrics Healthy?}
G -->|No| D
G -->|Yes| H[Ramp: 50% Traffic to Green]
H --> I[Monitor 5 Minutes]
I --> J{Metrics Healthy?}
J -->|No| D
J -->|Yes| K[Full: 100% Traffic to Green]
K --> L[Monitor 10 Minutes]
L --> M{Metrics Healthy?}
M -->|No| D
M -->|Yes| N[Success - Retire Blue]

Deployment Phases

PhaseTraffic SplitDurationRollback Trigger
1. CanaryBlue: 90%, Green: 10%5 minError rate > 5% OR P99 latency > 2s
2. RampBlue: 50%, Green: 50%5 minError rate > 5% OR P99 latency > 2s
3. FullBlue: 0%, Green: 100%10 minError rate > 5% OR P99 latency > 2s
4. RetireGreen: 100%-Keep previous revision for 24h

Total Deployment Time: 20 minutes (automated)


Health Check Endpoints

All services must implement:

GET /health

Response:

{
"status": "ok",
"version": "1.2.0",
"revision": "sha-abc123",
"database": "connected",
"dependencies": {
"redis": "connected",
"secretmanager": "accessible"
},
"timestamp": "2025-10-29T10:00:00Z"
}

Health Check Requirements:

  • Response time: < 500ms
  • Success rate: > 99.9%
  • Runs every 10 seconds during deployment
  • Failure triggers automatic rollback

Implementation (Go):

// internal/api/health.go
func HealthHandler(c *gin.Context) {
// Check database
if err := db.Ping(); err != nil {
c.JSON(503, gin.H{
"status": "unhealthy",
"database": "disconnected",
"error": err.Error(),
})
return
}

// Check Redis
if err := redis.Ping().Err(); err != nil {
c.JSON(503, gin.H{
"status": "unhealthy",
"redis": "disconnected",
"error": err.Error(),
})
return
}

c.JSON(200, gin.H{
"status": "ok",
"version": Version,
"revision": Revision,
"database": "connected",
"dependencies": gin.H{
"redis": "connected",
},
"timestamp": time.Now().Format(time.RFC3339),
})
}

Automated Deployment (CI/CD)

Workflow Files

  • CI Pipeline: .github/workflows/test.yml — Lint, test (with 80% coverage enforcement), build
  • Deploy Pipeline: .github/workflows/deploy.yml — Build image, migrate, deploy, traffic shift

Trigger

Deployments trigger automatically on push to main via the deploy workflow.

Docker Image

Images are built from apps/core-api/Dockerfile and pushed to GitHub Container Registry:

ghcr.io/javipelopi-dev/fiscalization/core-api:sha-<commit-sha>

Process

The deploy workflow (deploy.yml) automates the full deployment pipeline:

  1. Authenticate to GCP via Workload Identity Federation
  2. Build & push Docker image to GitHub Container Registry
  3. Run database migrations (BEFORE app deployment, via Cloud Run job)
  4. Deploy new revision to Cloud Run with 0% traffic
  5. Health checkGET /health with retry (6 attempts, 10s interval)
  6. Gradual traffic shift:
    • 10% traffic → wait 5 minutes → health check
    • 50% traffic → wait 5 minutes → health check
    • 100% traffic → final health check
  7. Automatic rollback if any step fails (traffic reverts to previous revision)

Required Secrets/Variables

SecretDescription
GCP_PROJECT_IDGCP project identifier
GCP_WORKLOAD_IDENTITY_PROVIDERWorkload Identity Federation provider
GCP_SERVICE_ACCOUNTGCP service account for deployment
CLOUD_SQL_CONNECTION_NAMECloud SQL instance connection name
GITHUB_TOKENAuto-provided for GHCR authentication

Coverage Enforcement

The CI pipeline (test.yml) enforces 80% minimum test coverage:

  • Coverage reports generated for apps/core-api and packages/shared
  • Build fails if any package drops below 80%
  • Coverage reports uploaded as artifacts (retained 30 days)

Manual Deployment

Use manual deployment only if CI/CD is unavailable.

Prerequisites

# Authenticate to GCP
gcloud auth login
gcloud config set project zyntem-dev

# Verify current deployment
gcloud run services list --region europe-west1

Deploy Service

# Deploy with traffic management
./scripts/deploy-dev.sh core-api

# Or manually
gcloud run deploy core-api \
--source apps/core-api \
--region europe-west1 \
--platform managed \
--no-traffic

# Shift traffic gradually (use gcloud commands from automated section)

Rollback Procedures

Automatic Rollback

Rollback triggers automatically if:

  • Health checks fail
  • Error rate > 5%
  • P99 latency > 2s

Process:

# Revert to previous revision
gcloud run services update-traffic core-api \
--to-revisions=PREVIOUS=100 \
--region europe-west1

Manual Rollback

If automatic rollback fails or manual intervention needed:

# List recent revisions
gcloud run revisions list \
--service core-api \
--region europe-west1 \
--limit 5

# Shift 100% traffic to previous revision
gcloud run services update-traffic core-api \
--to-revisions=core-api-00005-abc=100 \
--region europe-west1

# Verify
curl https://core-api-dev-zyntem.run.app/health

Time to Rollback: < 30 seconds

Option 2: Emergency Rollback Script

# Rollback to last known good revision
./scripts/emergency-rollback.sh core-api

# Rollback all services
./scripts/emergency-rollback.sh all

Script: scripts/emergency-rollback.sh

#!/bin/bash
set -e

SERVICE=$1
REGION="europe-west1"

if [ -z "$SERVICE" ]; then
echo "Usage: ./emergency-rollback.sh <service|all>"
exit 1
fi

rollback_service() {
local svc=$1
echo "Rolling back $svc..."

# Get previous revision (second in list, first is current)
PREVIOUS=$(gcloud run revisions list \
--service $svc \
--region $REGION \
--format="value(name)" \
--limit 2 | tail -n 1)

if [ -z "$PREVIOUS" ]; then
echo "ERROR: No previous revision found for $svc"
exit 1
fi

# Shift 100% traffic to previous
gcloud run services update-traffic $svc \
--to-revisions=$PREVIOUS=100 \
--region $REGION

echo "✓ Rolled back $svc to $PREVIOUS"
}

if [ "$SERVICE" == "all" ]; then
for svc in core-api dashboard; do
rollback_service $svc
done
else
rollback_service $SERVICE
fi

echo "✓ Rollback complete. Verify:"
echo " curl https://$SERVICE-dev-zyntem.run.app/health"

Database Migrations

Migration Strategy

Rule: Migrations must be backward-compatible (allow rollback without data loss)

Process:

  1. Run migrations BEFORE application deployment
  2. If migration fails, BLOCK deployment
  3. Application must work with both old and new schema during transition

Running Migrations

# Automated (in CI/CD)
gcloud run jobs execute migrate-db \
--region europe-west1 \
--wait

# Manual
export DATABASE_URL="postgresql://user:pass@host:5432/fiscalization"
migrate -path ./migrations -database $DATABASE_URL up

# Verify migration
migrate -path ./migrations -database $DATABASE_URL version

Migration Example (Backward-Compatible)

BAD - Breaking Change:

-- This breaks old app versions
ALTER TABLE transactions
DROP COLUMN old_field;

GOOD - Backward-Compatible:

-- Step 1 (Deploy): Add new column (old app ignores it)
ALTER TABLE transactions
ADD COLUMN new_field TEXT;

-- Step 2 (Wait 24h): Migrate data
UPDATE transactions
SET new_field = old_field
WHERE new_field IS NULL;

-- Step 3 (Next deploy): Make new_field NOT NULL
ALTER TABLE transactions
ALTER COLUMN new_field SET NOT NULL;

-- Step 4 (Next deploy): Drop old column (after all apps use new_field)
ALTER TABLE transactions
DROP COLUMN old_field;

Migration Rollback

Down migrations included:

-- up migration
CREATE TABLE new_table (...);

-- down migration (same file)
DROP TABLE IF EXISTS new_table;

Rollback command:

migrate -path ./migrations -database $DATABASE_URL down 1

Monitoring During Deployment

Required Dashboards

Open these dashboards before deployment:

  1. Cloud Run Service Metrics

  2. Cloud Monitoring

  3. Cloud Logs

Key Metrics to Watch

MetricThresholdAction if Exceeded
Error Rate< 5%Automatic rollback
P99 Latency< 2sAutomatic rollback
P95 Latency< 1sInvestigate (no rollback)
Request RateNo drop > 50%Investigate routing
Health Check Success> 99.9%Automatic rollback

Alerts

Alerts configured in Cloud Monitoring (Story 4.8):

  • Error rate > 5% for 2 minutes
  • P99 latency > 2s for 2 minutes
  • Health check failures > 3 in 1 minute

Notification Channels:

  • Email: alerts@zyntem.com
  • Slack: #fiscalization-alerts (optional)
  • PagerDuty: (production only, Phase 2)

Deployment Checklist

See DEPLOYMENT-CHECKLIST.md for comprehensive pre/during/post deployment checklist.

Quick Reference:

  • Database migrations tested
  • Breaking changes identified
  • Monitoring dashboards open
  • Rollback procedure reviewed
  • Team notified (if major deployment)

Terraform Configuration

Cloud Run Traffic Management:

# infrastructure/modules/cloud-run/main.tf
resource "google_cloud_run_service" "core_api" {
name = "core-api"
location = var.region

template {
spec {
containers {
image = var.image_url
}
}
}

traffic {
percent = 100
latest_revision = true
}

lifecycle {
ignore_changes = [
traffic, # Allow manual traffic management during deployment
]
}
}

# Keep last 3 revisions for rollback
resource "google_cloud_run_service_iam_policy" "core_api" {
# ... IAM configuration
}

Common Issues

Issue: Deployment Stuck at 0% Traffic

Symptom: New revision deployed but receives no traffic

Cause: Health checks failing

Solution:

# Check logs for new revision
gcloud run revisions describe REVISION_NAME --region europe-west1

# View logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.revision_name=REVISION_NAME" --limit 50

# Common causes:
# - Database connection issues
# - Missing environment variables
# - Startup crash

Issue: High Latency After Deployment

Symptom: P99 latency spikes after new revision

Cause: Cold start, inefficient code, or database issues

Solution:

# Check if cold start issue
gcloud run services describe core-api --region europe-west1 --format="value(spec.template.spec.containers[0].resources)"

# Increase min instances to prevent cold starts
gcloud run services update core-api \
--min-instances=1 \
--region europe-west1

Issue: Automatic Rollback Triggered

Symptom: Deployment reverts automatically

Cause: Metrics exceeded thresholds

Solution:

# Check which metric triggered rollback
gcloud logging read "resource.type=cloud_run_revision AND severity>=WARNING" --limit 20

# Common causes:
# - New bug causing errors
# - Inefficient database queries
# - External service timeout
# - Configuration error

# Fix issue locally, re-deploy

Next Steps

  1. ✅ Complete Story 1.10 (Deployment Strategy Implementation)
  2. ✅ Test deployment in dev environment
  3. ✅ Document any environment-specific issues
  4. ✅ Review RUNBOOK.md for operational procedures

Additional Resources