# Operations Runbook

## Daily Checks

- Check `GET /health`.
- Check owner `GET /v1/ops/metrics`.
- Confirm provider configuration, active provider, Stripe configuration, and cache index counts.
- Review provider spend against `PROVIDER_FLOAT_ALERT_MICRO_CREDITS`.
- Review failed reads, refunds, and rate-limit events.

## Incident: Provider Failures

1. Check whether errors are retryable or non-retryable.
2. Inspect response limitations without exposing raw provider errors.
3. Run `npm run smoke:hikerapi` if `HIKERAPI_KEY` is available.
4. Run `npm run smoke:apify` and `npm run smoke:apify:comments` only for Apify fallback or transcript experiments.
5. If HikerAPI is degraded and Bright Data or Apify is verified, flip `IGSKILL_PROVIDER_ORDER`.
6. If fallback increases limitations or changes schema quality, roll back and capture fixtures.

## Incident: Parser Failures

1. Omit `parse` to confirm raw reads still work.
2. Check `ACHRONON_AI_ENDPOINT` and `ACHRONON_AI_SERVICE_TOKEN` deployment config.
3. Keep raw model-provider keys out of igskill.
4. If parsing stays unavailable, return raw normalized data and a clear limitation.

## Incident: Transcription Failures

1. Retry the reel with `transcript=provider` to distinguish provider transcript absence from hosted fallback failure.
2. Check `ACHRONON_AI_ENDPOINT`, `ACHRONON_AI_SERVICE_TOKEN`, `ACHRONON_AI_TRANSCRIBE_PATH`, and `ACHRONON_AI_TRANSCRIBE_MAX_DURATION_SEC`.
3. Confirm HikerAPI returned a usable video or audio media URL before expiry.
4. Confirm the ClaudeVPS sandbox exposes `claudex/transcribe` and relay Codex auth is connected.
5. If fallback stays unavailable, `GET /v1/reel?transcript=true` should still return reel media with `transcription_pipeline_not_configured`, `transcription_media_url_missing`, or `transcription_pipeline_failed` limitations.

## Incident: Billing Or Stripe

1. Verify `STRIPE_SECRET_KEY`, `STRIPE_WEBHOOK_SECRET`, `STRIPE_TOPUP_PRICE_ID`, `STRIPE_SUCCESS_URL`, and `STRIPE_CANCEL_URL`.
2. Run a checkout smoke with a disposable customer key.
3. Confirm webhook grants credits only when metadata includes a known `accountId` and positive `microCredits`.
4. Do not manually grant credits from a checkout URL alone.

## Incident: State File

1. Keep replicas at `1` while using JSON state.
2. Back up `/data/igskill-state.json` before deployment or migration.
3. On restart, stale reservations are refunded according to `STALE_RESERVATION_MAX_AGE_MS`.
4. If the state file is corrupt, restore from backup and capture the corrupt file for analysis.

## Abuse Controls

- Keep public free tier disabled.
- Use invite-gated signup before public launch.
- Enforce documented comment limits: `20`, `100`, and explicit `500`.
- Keep inline post/reel comments capped at `100`.
- Use per-minute and daily micro-credit reservation rate limits.
- Do not support private content, personal cookies, or customer Instagram credentials.
