Some workflows have been undeployed.

Incident Report for Paragon

Postmortem

Undeployed Workflows: Public RCA

Summary

On Wednesday, February 5th, at approximately 19:10 UTC, we identified an issue where a subset of customer workflows were unexpectedly undeployed. Our engineering team immediately began investigating and restoring affected workflows, with full resolution completed shortly thereafter.

Impact

A number of customer workflows were temporarily set to an undeployed state, requiring manual reactivation. Some customers may have noticed disruptions in workflow execution during this period.

Root Cause

The issue was triggered by an administrative deployment management action from that did not correctly validate parameters under specific conditions. This resulted in certain workflows being marked as undeployed.

Resolution & Response

  • Our automated monitoring systems, including BetterStack and Grafana, immediately alerted our team to the issue.
  • Within 1 minute of the alert, members of our executive team and key engineers began actively investigating the root cause and identifying the full scope of impact.
  • An internal tool was found as the source and was immediately taken offline.
  • A status page was posted to notify customers of the disruption.
  • A dedicated response team quickly developed and deployed a fix to restore all affected workflows using Paragon’s built in version and deployment history.
  • The issue was fully resolved in a timely manner, ensuring that customers could resume normal operations.

Preventative Measures

To prevent this from happening again, we have implemented the following safeguards:

  • Enhanced Validation: We have updated our deployment management process to enforce stricter input validation, preventing similar scenarios.
  • Process Audits: A comprehensive review of related administrative workflows is underway to identify and address potential gaps.
  • Ongoing Monitoring Improvements: We are strengthening our real-time monitoring and alerting systems to further reduce response times.

Next Steps

We are committed to continuously improving our platform’s reliability and ensuring minimal disruptions to customer operations. Our engineering team is actively reviewing additional safeguards and process improvements to enhance system resilience.

We appreciate our customers’ patience and trust as we work to deliver a seamless, dependable experience. If you have any questions or concerns, please reach out to our support team.

Posted Feb 11, 2025 - 13:23 PST

Resolved

This incident has been resolved.
Posted Feb 05, 2025 - 12:45 PST

Update

We've implemented a solution and should have it shortly resolved. We apologize for any disruption this may have caused and appreciate your patience as we work to resolve this. We will continue to monitor the issue and expect the issue to be fully resolved in the next few minutes.
Posted Feb 05, 2025 - 12:24 PST

Identified

We are currently experiencing an incident where a number of workflows were unexpectedly un-deployed. Our team has identified the affected workflow IDs and is actively working on redeploying them now.

We apologize for any disruption this may have caused and appreciate your patience as we work to resolve the issue. We will provide further updates as soon as we have more information.

Thank you for your understanding.
Posted Feb 05, 2025 - 12:00 PST

Investigating

We are currently investigating this issue.
Posted Feb 05, 2025 - 11:35 PST
This incident affected: Workflows.