Zyntem Fiscalization - Deployment Guide

Company: Zyntem Product: Fiscalization by Zyntem Version: 1.0 Last Updated: 2025-10-30 Status: Active

Overview

This document describes the deployment strategy for Fiscalization by Zyntem, including zero-downtime deployments, rollback procedures, and monitoring requirements.

Deployment Strategy: Blue-Green deployment with gradual traffic shifting

SLA Target: 99.5% uptime (NFR2)

Deployment Strategy: Blue-Green with Canary

How It Works

graph TD
    A[New Code Deployed] --> B[Create New Revision - Green]
    B --> C[Health Check Passes?]
    C -->|No| D[Rollback - Delete Green]
    C -->|Yes| E[Canary: 10% Traffic to Green]
    E --> F[Monitor 5 Minutes]
    F --> G{Metrics Healthy?}
    G -->|No| D
    G -->|Yes| H[Ramp: 50% Traffic to Green]
    H --> I[Monitor 5 Minutes]
    I --> J{Metrics Healthy?}
    J -->|No| D
    J -->|Yes| K[Full: 100% Traffic to Green]
    K --> L[Monitor 10 Minutes]
    L --> M{Metrics Healthy?}
    M -->|No| D
    M -->|Yes| N[Success - Retire Blue]

Deployment Phases

Phase	Traffic Split	Duration	Rollback Trigger
1. Canary	Blue: 90%, Green: 10%	5 min	Error rate > 5% OR P99 latency > 2s
2. Ramp	Blue: 50%, Green: 50%	5 min	Error rate > 5% OR P99 latency > 2s
3. Full	Blue: 0%, Green: 100%	10 min	Error rate > 5% OR P99 latency > 2s
4. Retire	Green: 100%	-	Keep previous revision for 24h

Total Deployment Time: 20 minutes (automated)

Health Check Endpoints

All services must implement:

GET /health

Response:

{
  "status": "ok",
  "version": "1.2.0",
  "revision": "sha-abc123",
  "database": "connected",
  "dependencies": {
    "redis": "connected",
    "secretmanager": "accessible"
  },
  "timestamp": "2025-10-29T10:00:00Z"
}

Health Check Requirements:

Response time: < 500ms
Success rate: > 99.9%
Runs every 10 seconds during deployment
Failure triggers automatic rollback

Implementation (Go):

// internal/api/health.go
func HealthHandler(c *gin.Context) {
	// Check database
	if err := db.Ping(); err != nil {
		c.JSON(503, gin.H{
			"status": "unhealthy",
			"database": "disconnected",
			"error": err.Error(),
		})
		return
	}

	// Check Redis
	if err := redis.Ping().Err(); err != nil {
		c.JSON(503, gin.H{
			"status": "unhealthy",
			"redis": "disconnected",
			"error": err.Error(),
		})
		return
	}

	c.JSON(200, gin.H{
		"status":   "ok",
		"version":  Version,
		"revision": Revision,
		"database": "connected",
		"dependencies": gin.H{
			"redis": "connected",
		},
		"timestamp": time.Now().Format(time.RFC3339),
	})
}

Automated Deployment (CI/CD)

Workflow Files

CI Pipeline: .github/workflows/test.yml — Lint, test (with 80% coverage enforcement), build
Deploy Pipeline: .github/workflows/deploy.yml — Build image, migrate, deploy, traffic shift

Trigger

Deployments trigger automatically on push to main via the deploy workflow.

Docker Image

Images are built from apps/core-api/Dockerfile and pushed to GitHub Container Registry:

ghcr.io/javipelopi-dev/fiscalization/core-api:sha-<commit-sha>

Process

The deploy workflow (deploy.yml) automates the full deployment pipeline:

Authenticate to GCP via Workload Identity Federation
Build & push Docker image to GitHub Container Registry
Run database migrations (BEFORE app deployment, via Cloud Run job)
Deploy new revision to Cloud Run with 0% traffic
Health check — GET /health with retry (6 attempts, 10s interval)
Gradual traffic shift:
- 10% traffic → wait 5 minutes → health check
- 50% traffic → wait 5 minutes → health check
- 100% traffic → final health check
Automatic rollback if any step fails (traffic reverts to previous revision)

Required Secrets/Variables

Secret	Description
`GCP_PROJECT_ID`	GCP project identifier
`GCP_WORKLOAD_IDENTITY_PROVIDER`	Workload Identity Federation provider
`GCP_SERVICE_ACCOUNT`	GCP service account for deployment
`CLOUD_SQL_CONNECTION_NAME`	Cloud SQL instance connection name
`GITHUB_TOKEN`	Auto-provided for GHCR authentication

Coverage Enforcement

The CI pipeline (test.yml) enforces 80% minimum test coverage:

Coverage reports generated for apps/core-api and packages/shared
Build fails if any package drops below 80%
Coverage reports uploaded as artifacts (retained 30 days)

Manual Deployment

Use manual deployment only if CI/CD is unavailable.

Prerequisites

# Authenticate to GCP
gcloud auth login
gcloud config set project zyntem-dev

# Verify current deployment
gcloud run services list --region europe-west1

Deploy Service

# Deploy with traffic management
./scripts/deploy-dev.sh core-api

# Or manually
gcloud run deploy core-api \
  --source apps/core-api \
  --region europe-west1 \
  --platform managed \
  --no-traffic

# Shift traffic gradually (use gcloud commands from automated section)

Rollback Procedures

Automatic Rollback

Rollback triggers automatically if:

Health checks fail
Error rate > 5%
P99 latency > 2s

Process:

# Revert to previous revision
gcloud run services update-traffic core-api \
  --to-revisions=PREVIOUS=100 \
  --region europe-west1

Manual Rollback

If automatic rollback fails or manual intervention needed:

Option 1: Instant Rollback (Recommended)

# List recent revisions
gcloud run revisions list \
  --service core-api \
  --region europe-west1 \
  --limit 5

# Shift 100% traffic to previous revision
gcloud run services update-traffic core-api \
  --to-revisions=core-api-00005-abc=100 \
  --region europe-west1

# Verify
curl https://core-api-dev-zyntem.run.app/health

Time to Rollback: < 30 seconds

Option 2: Emergency Rollback Script

# Rollback to last known good revision
./scripts/emergency-rollback.sh core-api

# Rollback all services
./scripts/emergency-rollback.sh all

Script: scripts/emergency-rollback.sh

#!/bin/bash
set -e

SERVICE=$1
REGION="europe-west1"

if [ -z "$SERVICE" ]; then
  echo "Usage: ./emergency-rollback.sh <service|all>"
  exit 1
fi

rollback_service() {
  local svc=$1
  echo "Rolling back $svc..."

  # Get previous revision (second in list, first is current)
  PREVIOUS=$(gcloud run revisions list \
    --service $svc \
    --region $REGION \
    --format="value(name)" \
    --limit 2 | tail -n 1)

  if [ -z "$PREVIOUS" ]; then
    echo "ERROR: No previous revision found for $svc"
    exit 1
  fi

  # Shift 100% traffic to previous
  gcloud run services update-traffic $svc \
    --to-revisions=$PREVIOUS=100 \
    --region $REGION

  echo "✓ Rolled back $svc to $PREVIOUS"
}

if [ "$SERVICE" == "all" ]; then
  for svc in core-api dashboard; do
    rollback_service $svc
  done
else
  rollback_service $SERVICE
fi

echo "✓ Rollback complete. Verify:"
echo "  curl https://$SERVICE-dev-zyntem.run.app/health"

Database Migrations

Migration Strategy

Rule: Migrations must be backward-compatible (allow rollback without data loss)

Process:

Run migrations BEFORE application deployment
If migration fails, BLOCK deployment
Application must work with both old and new schema during transition

Running Migrations

# Automated (in CI/CD)
gcloud run jobs execute migrate-db \
  --region europe-west1 \
  --wait

# Manual
export DATABASE_URL="postgresql://user:pass@host:5432/fiscalization"
migrate -path ./migrations -database $DATABASE_URL up

# Verify migration
migrate -path ./migrations -database $DATABASE_URL version

Migration Example (Backward-Compatible)

❌ BAD - Breaking Change:

-- This breaks old app versions
ALTER TABLE transactions
  DROP COLUMN old_field;

✅ GOOD - Backward-Compatible:

-- Step 1 (Deploy): Add new column (old app ignores it)
ALTER TABLE transactions
  ADD COLUMN new_field TEXT;

-- Step 2 (Wait 24h): Migrate data
UPDATE transactions
  SET new_field = old_field
  WHERE new_field IS NULL;

-- Step 3 (Next deploy): Make new_field NOT NULL
ALTER TABLE transactions
  ALTER COLUMN new_field SET NOT NULL;

-- Step 4 (Next deploy): Drop old column (after all apps use new_field)
ALTER TABLE transactions
  DROP COLUMN old_field;

Migration Rollback

Down migrations included:

-- up migration
CREATE TABLE new_table (...);

-- down migration (same file)
DROP TABLE IF EXISTS new_table;

Rollback command:

migrate -path ./migrations -database $DATABASE_URL down 1

Monitoring During Deployment

Required Dashboards

Open these dashboards before deployment:

Cloud Run Service Metrics
- URL: https://console.cloud.google.com/run?project=zyntem-dev
- Metrics: Request count, Error rate, Latency (P50, P95, P99)
Cloud Monitoring
- URL: https://console.cloud.google.com/monitoring/dashboards
- Custom dashboard: "Deployment Monitoring"
Cloud Logs
- URL: https://console.cloud.google.com/logs
- Filter: resource.type="cloud_run_revision" AND resource.labels.service_name="core-api"

Key Metrics to Watch

Metric	Threshold	Action if Exceeded
Error Rate	< 5%	Automatic rollback
P99 Latency	< 2s	Automatic rollback
P95 Latency	< 1s	Investigate (no rollback)
Request Rate	No drop > 50%	Investigate routing
Health Check Success	> 99.9%	Automatic rollback

Alerts

Alerts configured in Cloud Monitoring (Story 4.8):

Error rate > 5% for 2 minutes
P99 latency > 2s for 2 minutes
Health check failures > 3 in 1 minute

Notification Channels:

Email: alerts@zyntem.com
Slack: #fiscalization-alerts (optional)
PagerDuty: (production only, Phase 2)

Deployment Checklist

See DEPLOYMENT-CHECKLIST.md for comprehensive pre/during/post deployment checklist.

Quick Reference:

Database migrations tested
Breaking changes identified
Monitoring dashboards open
Rollback procedure reviewed
Team notified (if major deployment)

Terraform Configuration

Cloud Run Traffic Management:

# infrastructure/modules/cloud-run/main.tf
resource "google_cloud_run_service" "core_api" {
  name     = "core-api"
  location = var.region

  template {
    spec {
      containers {
        image = var.image_url
      }
    }
  }

  traffic {
    percent         = 100
    latest_revision = true
  }

  lifecycle {
    ignore_changes = [
      traffic, # Allow manual traffic management during deployment
    ]
  }
}

# Keep last 3 revisions for rollback
resource "google_cloud_run_service_iam_policy" "core_api" {
  # ... IAM configuration
}

Common Issues

Issue: Deployment Stuck at 0% Traffic

Symptom: New revision deployed but receives no traffic

Cause: Health checks failing

Solution:

# Check logs for new revision
gcloud run revisions describe REVISION_NAME --region europe-west1

# View logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.revision_name=REVISION_NAME" --limit 50

# Common causes:
# - Database connection issues
# - Missing environment variables
# - Startup crash

Issue: High Latency After Deployment

Symptom: P99 latency spikes after new revision

Cause: Cold start, inefficient code, or database issues

Solution:

# Check if cold start issue
gcloud run services describe core-api --region europe-west1 --format="value(spec.template.spec.containers[0].resources)"

# Increase min instances to prevent cold starts
gcloud run services update core-api \
  --min-instances=1 \
  --region europe-west1

Issue: Automatic Rollback Triggered

Symptom: Deployment reverts automatically

Cause: Metrics exceeded thresholds

Solution:

# Check which metric triggered rollback
gcloud logging read "resource.type=cloud_run_revision AND severity>=WARNING" --limit 20

# Common causes:
# - New bug causing errors
# - Inefficient database queries
# - External service timeout
# - Configuration error

# Fix issue locally, re-deploy

Next Steps

✅ Complete Story 1.10 (Deployment Strategy Implementation)
✅ Test deployment in dev environment
✅ Document any environment-specific issues
✅ Review RUNBOOK.md for operational procedures

Overview​

Deployment Strategy: Blue-Green with Canary​

How It Works​

Deployment Phases​

Health Check Endpoints​

Automated Deployment (CI/CD)​

Workflow Files​

Trigger​

Docker Image​

Process​

Required Secrets/Variables​

Coverage Enforcement​

Manual Deployment​

Prerequisites​

Deploy Service​

Rollback Procedures​

Automatic Rollback​

Manual Rollback​

Option 1: Instant Rollback (Recommended)​

Option 2: Emergency Rollback Script​

Database Migrations​

Migration Strategy​

Running Migrations​

Migration Example (Backward-Compatible)​

Migration Rollback​

Monitoring During Deployment​

Required Dashboards​

Key Metrics to Watch​

Alerts​

Deployment Checklist​

Terraform Configuration​

Common Issues​

Issue: Deployment Stuck at 0% Traffic​

Issue: High Latency After Deployment​

Issue: Automatic Rollback Triggered​

Next Steps​

Additional Resources​

Overview

Deployment Strategy: Blue-Green with Canary

How It Works

Deployment Phases

Health Check Endpoints

Automated Deployment (CI/CD)

Workflow Files

Trigger

Docker Image

Process

Required Secrets/Variables

Coverage Enforcement

Manual Deployment

Prerequisites

Deploy Service

Rollback Procedures

Automatic Rollback

Manual Rollback

Option 1: Instant Rollback (Recommended)

Option 2: Emergency Rollback Script

Database Migrations

Migration Strategy

Running Migrations

Migration Example (Backward-Compatible)

Migration Rollback

Monitoring During Deployment

Required Dashboards

Key Metrics to Watch

Alerts

Deployment Checklist

Terraform Configuration

Common Issues

Issue: Deployment Stuck at 0% Traffic

Issue: High Latency After Deployment

Issue: Automatic Rollback Triggered

Next Steps

Additional Resources