Zyntem Fiscalization - Deployment Guide
Company: Zyntem Product: Fiscalization by Zyntem Version: 1.0 Last Updated: 2025-10-30 Status: Active
Overview
This document describes the deployment strategy for Fiscalization by Zyntem, including zero-downtime deployments, rollback procedures, and monitoring requirements.
Deployment Strategy: Blue-Green deployment with gradual traffic shifting
SLA Target: 99.5% uptime (NFR2)
Deployment Strategy: Blue-Green with Canary
How It Works
graph TD
A[New Code Deployed] --> B[Create New Revision - Green]
B --> C[Health Check Passes?]
C -->|No| D[Rollback - Delete Green]
C -->|Yes| E[Canary: 10% Traffic to Green]
E --> F[Monitor 5 Minutes]
F --> G{Metrics Healthy?}
G -->|No| D
G -->|Yes| H[Ramp: 50% Traffic to Green]
H --> I[Monitor 5 Minutes]
I --> J{Metrics Healthy?}
J -->|No| D
J -->|Yes| K[Full: 100% Traffic to Green]
K --> L[Monitor 10 Minutes]
L --> M{Metrics Healthy?}
M -->|No| D
M -->|Yes| N[Success - Retire Blue]
Deployment Phases
| Phase | Traffic Split | Duration | Rollback Trigger |
|---|---|---|---|
| 1. Canary | Blue: 90%, Green: 10% | 5 min | Error rate > 5% OR P99 latency > 2s |
| 2. Ramp | Blue: 50%, Green: 50% | 5 min | Error rate > 5% OR P99 latency > 2s |
| 3. Full | Blue: 0%, Green: 100% | 10 min | Error rate > 5% OR P99 latency > 2s |
| 4. Retire | Green: 100% | - | Keep previous revision for 24h |
Total Deployment Time: 20 minutes (automated)
Health Check Endpoints
All services must implement:
GET /health
Response:
{
"status": "ok",
"version": "1.2.0",
"revision": "sha-abc123",
"database": "connected",
"dependencies": {
"redis": "connected",
"secretmanager": "accessible"
},
"timestamp": "2025-10-29T10:00:00Z"
}
Health Check Requirements:
- Response time: < 500ms
- Success rate: > 99.9%
- Runs every 10 seconds during deployment
- Failure triggers automatic rollback
Implementation (Go):
// internal/api/health.go
func HealthHandler(c *gin.Context) {
// Check database
if err := db.Ping(); err != nil {
c.JSON(503, gin.H{
"status": "unhealthy",
"database": "disconnected",
"error": err.Error(),
})
return
}
// Check Redis
if err := redis.Ping().Err(); err != nil {
c.JSON(503, gin.H{
"status": "unhealthy",
"redis": "disconnected",
"error": err.Error(),
})
return
}
c.JSON(200, gin.H{
"status": "ok",
"version": Version,
"revision": Revision,
"database": "connected",
"dependencies": gin.H{
"redis": "connected",
},
"timestamp": time.Now().Format(time.RFC3339),
})
}
Automated Deployment (CI/CD)
Workflow Files
- CI Pipeline:
.github/workflows/test.yml— Lint, test (with 80% coverage enforcement), build - Deploy Pipeline:
.github/workflows/deploy.yml— Build image, migrate, deploy, traffic shift
Trigger
Deployments trigger automatically on push to main via the deploy workflow.
Docker Image
Images are built from apps/core-api/Dockerfile and pushed to GitHub Container Registry:
ghcr.io/javipelopi-dev/fiscalization/core-api:sha-<commit-sha>
Process
The deploy workflow (deploy.yml) automates the full deployment pipeline:
- Authenticate to GCP via Workload Identity Federation
- Build & push Docker image to GitHub Container Registry
- Run database migrations (BEFORE app deployment, via Cloud Run job)
- Deploy new revision to Cloud Run with 0% traffic
- Health check —
GET /healthwith retry (6 attempts, 10s interval) - Gradual traffic shift:
- 10% traffic → wait 5 minutes → health check
- 50% traffic → wait 5 minutes → health check
- 100% traffic → final health check
- Automatic rollback if any step fails (traffic reverts to previous revision)
Required Secrets/Variables
| Secret | Description |
|---|---|
GCP_PROJECT_ID | GCP project identifier |
GCP_WORKLOAD_IDENTITY_PROVIDER | Workload Identity Federation provider |
GCP_SERVICE_ACCOUNT | GCP service account for deployment |
CLOUD_SQL_CONNECTION_NAME | Cloud SQL instance connection name |
GITHUB_TOKEN | Auto-provided for GHCR authentication |
Coverage Enforcement
The CI pipeline (test.yml) enforces 80% minimum test coverage:
- Coverage reports generated for
apps/core-apiandpackages/shared - Build fails if any package drops below 80%
- Coverage reports uploaded as artifacts (retained 30 days)
Manual Deployment
Use manual deployment only if CI/CD is unavailable.
Prerequisites
# Authenticate to GCP
gcloud auth login
gcloud config set project zyntem-dev
# Verify current deployment
gcloud run services list --region europe-west1
Deploy Service
# Deploy with traffic management
./scripts/deploy-dev.sh core-api
# Or manually
gcloud run deploy core-api \
--source apps/core-api \
--region europe-west1 \
--platform managed \
--no-traffic
# Shift traffic gradually (use gcloud commands from automated section)
Rollback Procedures
Automatic Rollback
Rollback triggers automatically if:
- Health checks fail
- Error rate > 5%
- P99 latency > 2s
Process:
# Revert to previous revision
gcloud run services update-traffic core-api \
--to-revisions=PREVIOUS=100 \
--region europe-west1
Manual Rollback
If automatic rollback fails or manual intervention needed:
Option 1: Instant Rollback (Recommended)
# List recent revisions
gcloud run revisions list \
--service core-api \
--region europe-west1 \
--limit 5
# Shift 100% traffic to previous revision
gcloud run services update-traffic core-api \
--to-revisions=core-api-00005-abc=100 \
--region europe-west1
# Verify
curl https://core-api-dev-zyntem.run.app/health
Time to Rollback: < 30 seconds
Option 2: Emergency Rollback Script
# Rollback to last known good revision
./scripts/emergency-rollback.sh core-api
# Rollback all services
./scripts/emergency-rollback.sh all
Script: scripts/emergency-rollback.sh
#!/bin/bash
set -e
SERVICE=$1
REGION="europe-west1"
if [ -z "$SERVICE" ]; then
echo "Usage: ./emergency-rollback.sh <service|all>"
exit 1
fi
rollback_service() {
local svc=$1
echo "Rolling back $svc..."
# Get previous revision (second in list, first is current)
PREVIOUS=$(gcloud run revisions list \
--service $svc \
--region $REGION \
--format="value(name)" \
--limit 2 | tail -n 1)
if [ -z "$PREVIOUS" ]; then
echo "ERROR: No previous revision found for $svc"
exit 1
fi
# Shift 100% traffic to previous
gcloud run services update-traffic $svc \
--to-revisions=$PREVIOUS=100 \
--region $REGION
echo "✓ Rolled back $svc to $PREVIOUS"
}
if [ "$SERVICE" == "all" ]; then
for svc in core-api dashboard; do
rollback_service $svc
done
else
rollback_service $SERVICE
fi
echo "✓ Rollback complete. Verify:"
echo " curl https://$SERVICE-dev-zyntem.run.app/health"
Database Migrations
Migration Strategy
Rule: Migrations must be backward-compatible (allow rollback without data loss)
Process:
- Run migrations BEFORE application deployment
- If migration fails, BLOCK deployment
- Application must work with both old and new schema during transition
Running Migrations
# Automated (in CI/CD)
gcloud run jobs execute migrate-db \
--region europe-west1 \
--wait
# Manual
export DATABASE_URL="postgresql://user:pass@host:5432/fiscalization"
migrate -path ./migrations -database $DATABASE_URL up
# Verify migration
migrate -path ./migrations -database $DATABASE_URL version
Migration Example (Backward-Compatible)
❌ BAD - Breaking Change:
-- This breaks old app versions
ALTER TABLE transactions
DROP COLUMN old_field;
✅ GOOD - Backward-Compatible:
-- Step 1 (Deploy): Add new column (old app ignores it)
ALTER TABLE transactions
ADD COLUMN new_field TEXT;
-- Step 2 (Wait 24h): Migrate data
UPDATE transactions
SET new_field = old_field
WHERE new_field IS NULL;
-- Step 3 (Next deploy): Make new_field NOT NULL
ALTER TABLE transactions
ALTER COLUMN new_field SET NOT NULL;
-- Step 4 (Next deploy): Drop old column (after all apps use new_field)
ALTER TABLE transactions
DROP COLUMN old_field;
Migration Rollback
Down migrations included:
-- up migration
CREATE TABLE new_table (...);
-- down migration (same file)
DROP TABLE IF EXISTS new_table;
Rollback command:
migrate -path ./migrations -database $DATABASE_URL down 1
Monitoring During Deployment
Required Dashboards
Open these dashboards before deployment:
-
Cloud Run Service Metrics
- URL: https://console.cloud.google.com/run?project=zyntem-dev
- Metrics: Request count, Error rate, Latency (P50, P95, P99)
-
Cloud Monitoring
- URL: https://console.cloud.google.com/monitoring/dashboards
- Custom dashboard: "Deployment Monitoring"
-
Cloud Logs
- URL: https://console.cloud.google.com/logs
- Filter:
resource.type="cloud_run_revision" AND resource.labels.service_name="core-api"
Key Metrics to Watch
| Metric | Threshold | Action if Exceeded |
|---|---|---|
| Error Rate | < 5% | Automatic rollback |
| P99 Latency | < 2s | Automatic rollback |
| P95 Latency | < 1s | Investigate (no rollback) |
| Request Rate | No drop > 50% | Investigate routing |
| Health Check Success | > 99.9% | Automatic rollback |
Alerts
Alerts configured in Cloud Monitoring (Story 4.8):
- Error rate > 5% for 2 minutes
- P99 latency > 2s for 2 minutes
- Health check failures > 3 in 1 minute
Notification Channels:
- Email: alerts@zyntem.com
- Slack: #fiscalization-alerts (optional)
- PagerDuty: (production only, Phase 2)
Deployment Checklist
See DEPLOYMENT-CHECKLIST.md for comprehensive pre/during/post deployment checklist.
Quick Reference:
- Database migrations tested
- Breaking changes identified
- Monitoring dashboards open
- Rollback procedure reviewed
- Team notified (if major deployment)
Terraform Configuration
Cloud Run Traffic Management:
# infrastructure/modules/cloud-run/main.tf
resource "google_cloud_run_service" "core_api" {
name = "core-api"
location = var.region
template {
spec {
containers {
image = var.image_url
}
}
}
traffic {
percent = 100
latest_revision = true
}
lifecycle {
ignore_changes = [
traffic, # Allow manual traffic management during deployment
]
}
}
# Keep last 3 revisions for rollback
resource "google_cloud_run_service_iam_policy" "core_api" {
# ... IAM configuration
}
Common Issues
Issue: Deployment Stuck at 0% Traffic
Symptom: New revision deployed but receives no traffic
Cause: Health checks failing
Solution:
# Check logs for new revision
gcloud run revisions describe REVISION_NAME --region europe-west1
# View logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.revision_name=REVISION_NAME" --limit 50
# Common causes:
# - Database connection issues
# - Missing environment variables
# - Startup crash
Issue: High Latency After Deployment
Symptom: P99 latency spikes after new revision
Cause: Cold start, inefficient code, or database issues
Solution:
# Check if cold start issue
gcloud run services describe core-api --region europe-west1 --format="value(spec.template.spec.containers[0].resources)"
# Increase min instances to prevent cold starts
gcloud run services update core-api \
--min-instances=1 \
--region europe-west1
Issue: Automatic Rollback Triggered
Symptom: Deployment reverts automatically
Cause: Metrics exceeded thresholds
Solution:
# Check which metric triggered rollback
gcloud logging read "resource.type=cloud_run_revision AND severity>=WARNING" --limit 20
# Common causes:
# - New bug causing errors
# - Inefficient database queries
# - External service timeout
# - Configuration error
# Fix issue locally, re-deploy
Next Steps
- ✅ Complete Story 1.10 (Deployment Strategy Implementation)
- ✅ Test deployment in dev environment
- ✅ Document any environment-specific issues
- ✅ Review RUNBOOK.md for operational procedures