2025-Q1

Jan 1, 2025 - Mar 31, 2025

SLA Violation

Total Downtime

22h 11m

Weighted by impact

Total Incidents

In this quarter (24 tracked)

Worst Component

Actions

99.690% uptime

Service Features

Time-based uptime calculation for the 129,600 minutes in this quarter

Calculation Method: (Total minutes - Downtime) / Total minutes × 100
Downtime Definition: Minutes with >5% error rate (approximated from incident data)

Component	Uptime %	Downtime	Incidents	Status	Service Credit
Git Operations	99.9244%	1h 38m	4	Pass	None
API Requests	99.9323%	1h 28m	4	Pass	None
Issues	99.8017%	4h 17m	9	Violation	10%
Pull Requests	99.8027%	4h 16m	9	Violation	10%
Webhooks	99.9190%	1h 45m	5	Pass	None
Pages	99.9304%	1h 30m	4	Pass	None

Actions

Execution-based calculation (workflow success rate)

Official Calculation Method: (Total executions - Failed executions) / Total executions × 100

⚠️ Data Accuracy Limitation: The GitHub Status API does not provide execution counts or failure rates. The uptime percentages shown below are approximations based on incident duration and impact, not the actual workflow success rate used in GitHub's official SLA calculations.

Component	Uptime %	Downtime	Incidents
Actions	99.6896%	6h 42m	14

Packages

Hybrid calculation with two separate metrics

Official Calculation Methods:
1. Package Transfers: (Total transfers - Failed transfers) / Total transfers × 100
2. Package Storage: (Total minutes - Minutes with >5% error rate) / Total minutes × 100

⚠️ Data Accuracy Limitation: The GitHub Status API does not provide transfer counts or storage error rates. The uptime percentage shown below is an approximation based on incident duration and impact, not the actual metrics used in GitHub's official SLA calculations.

Component	Uptime %	Downtime	Incidents
Packages	99.9732%	35m	2

Incidents in 2025-Q1

46 incidents occurred during this quarter

Disruption with some GitHub services

major resolved

Created: Mar 31, 2025, 04:27 PM

Resolved: Mar 31, 2025, 05:57 PM

Duration: 1h 30m

Weighted Downtime: 1h 7.5m

3 updates

resolved Mar 31, 2025, 05:57 PM

Between March 29 7:00 UTC and March 31 17:00 UTC users were unable to unsubscribe from GitHub marketing email subscriptions due to a service outage. Additionally, on March 31, 2025 from 7:00 UTC to 16:40 UTC users were unable to submit eBook and event registration forms on resources.github.com, also due to a service outage. The incident occurred due to expired credentials used for an internal service. We mitigated it by renewing the credentials and redeploying the affected services. To improve future response times and prevent similar issues, we are enhancing our credential expiry detection, rotation processes, and on-call observability and alerting.

update Mar 31, 2025, 04:46 PM

We are currently applying a mitigation to resolve an issue with managing marketing email subscriptions.

investigating Mar 31, 2025, 04:27 PM

We are currently investigating this issue.

[Retroactive] Disruption with Pull Request Ref Updates

none resolved

Created: Mar 28, 2025, 10:50 PM

Resolved: Mar 28, 2025, 10:50 PM

Duration: 0m

Weighted Downtime: 0m

1 update

resolved Mar 28, 2025, 10:50 PM

Beginning at 21:24 UTC on March 28 and lasting until 21:50 UTC, some customers of github.com had issues with PR tracking refs not being updated due to processing delays and increased failure rates. We did not status before we completed the rollback, and the incident is currently resolved. We are sorry for the delayed post on githubstatus.com.

Disruption with some GitHub services

major resolved

Created: Mar 28, 2025, 05:53 PM

Resolved: Mar 28, 2025, 06:14 PM

Duration: 21m

Weighted Downtime: 15.75m

2 updates

resolved Mar 28, 2025, 06:14 PM

This incident was opened by mistake. Public services are currently functional.

investigating Mar 28, 2025, 05:53 PM

We are currently investigating this issue.

Disruption with Pull Request Ref Updates

minor resolved

Created: Mar 27, 2025, 11:49 PM

Resolved: Mar 28, 2025, 01:40 AM

Duration: 1h 51m

Weighted Downtime: 27.75m

Affected Components: Pull Requests

6 updates

update Mar 28, 2025, 01:40 AM

This issue has been mitigated and we are operating normally.

resolved Mar 28, 2025, 01:40 AM

Between March 27, 2025, 23:45 UTC and March 28, 2025, 01:40 UTC the Pull Requests service was degraded and failed to update refs for repositories with higher traffic activity. This was due to a large repository migration that resulted in a larger than usual number of enqueued jobs; while simultaneously impacting git fileservers where the problematic repository was hosted. This resulted in an increase in queue depth due to retries on failures to perform those jobs causing delays for non-migration sourced jobs.We declared an incident once we confirmed that this issue was not isolated to the problematic migration and other repositories were also failing to process ref updates. We mitigated the issue by stopping the migration and short circuiting the remaining jobs. Additionally, we increased the worker pool of this job to reduce the time required to recover. As a result of this incident, we are revisiting our repository migration process and are working to isolate potentially problematic migration workloads from non-migration workloads.

update Mar 28, 2025, 12:54 AM

We are continuing to monitor for recovery.

update Mar 28, 2025, 12:20 AM

We believe we have identified the source of the issue and are monitoring for recovery.

update Mar 27, 2025, 11:52 PM

Pull Requests is experiencing degraded performance. We are continuing to investigate.

investigating Mar 27, 2025, 11:49 PM

We are currently investigating this issue.

[Retroactive] Incident with Migrations Submitted Via GitHub UI

minor resolved

Created: Mar 23, 2025, 06:00 PM

Resolved: Mar 23, 2025, 06:00 PM

Duration: 0m

Weighted Downtime: 0m

1 update

resolved Mar 23, 2025, 06:00 PM

Between 2024-03-23 18:10 UTC and 2024-03-24 16:10 UTC, migration jobs submitted through the GitHub UI experienced processing delays and increased failure rates. This issue only affected migrations initiated via the web interface. Migrations started through the API or the command line tool continued to function normally. We are sorry for the delayed post on githubstatus.com.

Disruption with some GitHub services

minor resolved

Created: Mar 21, 2025, 12:40 PM

Resolved: Mar 21, 2025, 01:44 PM

Duration: 1h 4m

Weighted Downtime: 16m

Affected Components: Copilot

7 updates

update Mar 21, 2025, 01:44 PM

Copilot is operating normally.

resolved Mar 21, 2025, 01:44 PM

On March 21st, 2025, between 11:45 UTC and 13:20 UTC, users were unable to interact with GitHub Copilot Chat in GitHub. The issue was caused by a recently deployed Ruby change that unintentionally overwrote a global value. This led to GitHub Copilot Chat in GitHub being misconfigured with an invalid URL, preventing it from connecting to our chat server. Other Copilot clients were not affected.We mitigated the incident by identifying the source of the problematic query and rolling back the deployment.We are reviewing our deployment tooling to reduce the time to mitigate similar incidents in the future. In parallel, we are also improving our test coverage for this category of error to prevent them from being deployed to production.

update Mar 21, 2025, 01:43 PM

Mitigation is complete and we are seeing full recovery for GitHub Copilot Chat in GitHub.

update Mar 21, 2025, 01:16 PM

We have identified the problem and have a mitigation in progress.

update Mar 21, 2025, 01:00 PM

Copilot is experiencing degraded performance. We are continuing to investigate.

update Mar 21, 2025, 12:42 PM

We are investigating issues with GitHub Copilot Chat in GitHub. We will continue to keep users updated on progress toward mitigation.

investigating Mar 21, 2025, 12:40 PM

We are currently investigating this issue.

Intermittent GitHub Actions workflow failures

minor resolved

Created: Mar 21, 2025, 06:21 AM

Resolved: Mar 21, 2025, 09:34 AM

Duration: 3h 13m

Weighted Downtime: 48.25m

Affected Components: Actions

7 updates

update Mar 21, 2025, 09:34 AM

Actions is operating normally.

resolved Mar 21, 2025, 09:34 AM

On March 21st, 2025, between 05:43 UTC and 08:49 UTC, the Actions service experienced degradation, leading to workflow run failures. During the incident, approximately 2.45% of workflow runs failed due to an infrastructure failure. This incident was caused by intermittent failures in communicating with an underlying service provider. We are working to improve our resilience to downtime in this service provider and to reduce the time to mitigate in any future recurrences.

update Mar 21, 2025, 09:05 AM

We have made progress understanding the source of these errors and are working on a mitigation.

update Mar 21, 2025, 08:20 AM

We're continuing to investigate elevated errors during GitHub Actions workflow runs. At this stage our monitoring indicates that these errors are impacting no more than 3% of all runs.

update Mar 21, 2025, 07:27 AM

We're continuing to investigate intermittent failures with GitHub Actions workflow runs.

update Mar 21, 2025, 06:55 AM

We're seeing errors reported with a subset of GitHub Actions workflow runs, and are continuing to investigate.

investigating Mar 21, 2025, 06:21 AM

We are investigating reports of degraded performance for Actions

Incident with Codespaces

minor resolved

Created: Mar 21, 2025, 02:12 AM

Resolved: Mar 21, 2025, 03:08 AM

Duration: 56m

Weighted Downtime: 14m

Affected Components: Codespaces

6 updates

update Mar 21, 2025, 03:08 AM

We have seen full recovery in the last 15 minutes for Codespaces connections. GitHub Codespaces are healthy. For users who are still seeing connection problems, restarting the Codespace may help resolve the issue.

update Mar 21, 2025, 03:08 AM

Codespaces is operating normally.

resolved Mar 21, 2025, 03:08 AM

On March 21, 2025 between 01:00 UTC and 02:45 UTC, the Codespaces service was degraded and users in various regions experienced intermittent connection failures. The peak error error was 30% of connection attempts across 38% of Codespaces. This was due to a service deployment.The incident was mitigated by completing the deployment to the impacted regions. We are working with the service team to identify the cause of the connection losses and perform necessary repairs to avoid future occurrences.

update Mar 21, 2025, 02:53 AM

We are continuing to investigate issues with failed connections to Codespaces. We are seeing recovery over the last 10 minutes.

update Mar 21, 2025, 02:19 AM

Customers may be experiencing issues connecting to Codespaces on GitHub.com. We are currently investigating the underlying issue.

investigating Mar 21, 2025, 02:12 AM

We are investigating reports of degraded performance for Codespaces

Incident with Pages

minor resolved

Created: Mar 20, 2025, 08:04 PM

Resolved: Mar 20, 2025, 08:54 PM

Duration: 50m

Weighted Downtime: 12.5m

Affected Components: Webhooks and Pages

5 updates

resolved Mar 20, 2025, 08:54 PM

On March 20, 2025, between 19:24 UTC and 20:42 UTC the GitHub Pages experience was degraded and returned 503s for some customers. We saw an error rate of roughly 2% for Pages views, and new page builds were unable to complete successfully before timing out. This was due to replication failure at the database layer between a write destination and read destination. We mitigated the incident by redirecting reads to the same destination as writes. The error with replication occurred while in this transitory phase, as we are in the process of migrating the underlying data for Pages to new database infrastructure. Additionally our monitors failed to detect the error.We are addressing the underlying cause of the failed replication and telemetry.

update Mar 20, 2025, 08:53 PM

We have resolved the issue for Pages. If you're still experiencing issues with your GitHub Pages site, please rebuild.

update Mar 20, 2025, 08:38 PM

Customers may not be able to create or make changes to their GitHub Pages sites. Customers who rely on webhook events from Pages builds might also experience a downgraded experience.

update Mar 20, 2025, 08:33 PM

Webhooks is experiencing degraded performance. We are continuing to investigate.

investigating Mar 20, 2025, 08:04 PM

We are investigating reports of degraded performance for Pages

Incident with Actions: Queue Run Failures

minor resolved

Created: Mar 18, 2025, 11:45 PM

Resolved: Mar 19, 2025, 12:55 AM

Duration: 1h 10m

Weighted Downtime: 17.5m

Affected Components: Actions

6 updates

update Mar 19, 2025, 12:55 AM

The provider has reported full mitigation of the underlying issue, and Actions has been healthy since approximately 00:15 UTC.

update Mar 19, 2025, 12:55 AM

Actions is operating normally.

resolved Mar 19, 2025, 12:55 AM

On March 18th, 2025, between 23:20 UTC and March 19th, 2025 00:15 UTC, the Actions service experienced degradation, leading to run start delays. During the incident, about 0.3% of all workflow runs queued during the time failed to start, about 0.67% of all workflow runs were delayed by an average of 10 minutes, and about 0.16% of all workflow runs ultimately ended with an infrastructure failure. This was due to a networking issue with an underlying service provider. At 00:15 UTC the service provider mitigated their issue, and service was restored immediately for Actions. We are working to improve our resilience to downtime in this service provider to reduce the time to mitigate in any future recurrences.

update Mar 19, 2025, 12:22 AM

We are continuing to investigate issues with delayed or failed workflow runs with Actions. We are engaged with a third-party provider who is also investigating issues and has confirmed we are impacted.

investigating Mar 18, 2025, 11:45 PM

We are investigating reports of degraded performance for Actions

update Mar 18, 2025, 11:45 PM

Some customers may be experiencing delays or failures when queueing workflow runs

Disruption with some GitHub services

minor resolved

Created: Mar 18, 2025, 03:58 PM

Resolved: Mar 18, 2025, 06:45 PM

Duration: 2h 47m

Weighted Downtime: 41.75m

6 updates

resolved Mar 18, 2025, 06:45 PM

On March 18th, 2025, between 13:35 UTC and 17:45 UTC, some users of GitHub Copilot Chat in GitHub experienced intermittent failures when reading or writing messages in a thread, resulting in a degraded experience. The error rate peaked at 3% of requests to the service. This was due to an availability incident with a database provider. Around 16:15 UTC the upstream service provider mitigated their availability incident, and service was restored in the following hour.We are working to improve our failover strategy for this database to reduce the time to mitigate similar incidents in the future.

update Mar 18, 2025, 06:28 PM

We are seeing recovery and no new errors for the last 15mins.

update Mar 18, 2025, 05:42 PM

We are still investigating infrastructure issues and our provider has acknowledged the issues and is working on a mitigation. Customers might still see errors when creating messages, or new threads in Copilot Chat. Retries might be successful.

update Mar 18, 2025, 04:42 PM

We are still investigating infrastructure issues and collaborating with providers. Customers might see some errors when creating messages, or new threads in Copilot Chat. Retries might be successful.

update Mar 18, 2025, 04:00 PM

We are experiencing issues with our underlying data store which is causing a degraded experience for a small percentage of users using Copilot Chat in github.com

investigating Mar 18, 2025, 03:58 PM

We are currently investigating this issue.

macos-15-arm64 hosted runner queue delays

minor resolved

Created: Mar 18, 2025, 03:05 PM

Resolved: Mar 18, 2025, 05:15 PM

Duration: 2h 10m

Weighted Downtime: 32.5m

5 updates

resolved Mar 18, 2025, 05:15 PM

On March 18, between 13:04 and 16:55 UTC, Actions workflows relying on hosted runners using the beta MacOS 15 image experienced increased queue time waiting for available runners. An image update was pushed the previous day that included a performance reduction. The slower performance caused longer average runtimes, exhausting our available Mac capacity for this image. This was mitigated by rolling back the image update. We have updated our capacity allocation to the beta and other Mac images and are improving monitoring in our canary environments to catch this potential issue before it impacts customers.

update Mar 18, 2025, 04:56 PM

We are seeing improvements in telemetry and are monitoring for full recovery.

update Mar 18, 2025, 04:36 PM

We've applied a mitigation to fix the issues with queuing Actions jobs on macos-15-arm64 Hosted runner. We are monitoring.

update Mar 18, 2025, 03:43 PM

The team continues to investigate issues with some Actions macos-15-arm64 Hosted jobs being queued for up to 15 minutes. We will continue providing updates on the progress towards mitigation.

investigating Mar 18, 2025, 03:05 PM

We are currently investigating this issue.

Incident with Issues

minor resolved

Created: Mar 17, 2025, 06:39 PM

Resolved: Mar 17, 2025, 11:02 PM

Duration: 4h 23m

Weighted Downtime: 1h 5.75m

Affected Components: Issues

6 updates

resolved Mar 17, 2025, 11:02 PM

Between March 17, 2025, 18:05 UTC and March 18, 2025, 09:50 UTC, GitHub.com experienced intermittent failures in web and API requests. These issues affected a small percentage of users (mostly related to pull requests and issues), with a peak error rate of 0.165% across all requests.We identified a framework upgrade that caused kernel panics in our Kubernetes infrastructure as the root cause. We mitigated the incident by downgrading until we were able to disable a problematic feature. In response, we have investigated why the upgrade caused the unexpected issue, have taken steps to temporarily prevent it, and are working on longer term patch plans while improving our observability to ensure we can quickly react to similar classes of problems in the future.

update Mar 17, 2025, 11:01 PM

We saw a spike in error rate with issues related pages and API requests due to some problems with restarts in our kubernetes infrastructure that, at peak, caused 0.165% of requests to see timeouts or errors related to these API surfaces over a 15 minute period. At this time we see minimal impact and are continuing to investigate the cause of the issue.

update Mar 17, 2025, 09:25 PM

We are investigating reports of issues with service(s): Issues We're continuing to investigate. Users may see intermittent HTTP 500 responses when using Issues. Retrying the request may succeed.

update Mar 17, 2025, 08:51 PM

We are investigating reports of issues with service(s): Issues We're continuing to investigate. We will continue to keep users updated on progress towards mitigation.

update Mar 17, 2025, 07:19 PM

We are investigating reports of issues with service(s): Issues. We will continue to keep users updated on progress towards mitigation.

investigating Mar 17, 2025, 06:39 PM

We are investigating reports of degraded performance for Issues

Some Actions users are seeing their workflow jobs failing to start

minor resolved

Created: Mar 12, 2025, 01:28 PM

Resolved: Mar 12, 2025, 02:07 PM

Duration: 39m

Weighted Downtime: 9.75m

Affected Components: Actions

3 updates

resolved Mar 12, 2025, 02:07 PM

On March 12, 2025, between 13:28 UTC and 14:07 UTC, the Actions service experienced degradation leading to run start delays. During the incident, about 0.6% of workflow runs failed to start, 0.8% of workflow runs were delayed by an average of one hour, and 0.1% of runs ultimately ended with an infrastructure failure. The issue stemmed from connectivity problems between the Actions services and certain nodes within one of our Redis clusters. The service began recovering once connectivity to the Redis cluster was restored at 13:41 UTC. These connectivity issues are typically not a concern because we can fail over to healthier replicas. However, due to an unrelated issue, there was a replication delay at the time of the incident, and failing over would have caused a greater impact on our customers. We are working on improving our resiliency and automation processes for this infrastructure to improve the speed of diagnosing and resolving similar issues in the future.

update Mar 12, 2025, 01:55 PM

We have applied a mitigation for the affected Redis node, and are starting to see recovery with Action workflow executions.

investigating Mar 12, 2025, 01:28 PM

We are investigating reports of degraded performance for Actions

Incident with Actions and Pages

minor resolved

Created: Mar 8, 2025, 05:45 PM

Resolved: Mar 8, 2025, 06:11 PM

Duration: 26m

Weighted Downtime: 6.5m

Affected Components: Actions and Pages

6 updates

update Mar 8, 2025, 06:11 PM

Actions is operating normally.

resolved Mar 8, 2025, 06:11 PM

On March 8, 2025, between 17:16 UTC and 18:02 UTC, GitHub Actions and Pages services experienced degraded performance leading to delays in workflow runs and Pages deployments. During this time, 34% of Actions workflow runs experienced delays, and a small percentage of runs using GitHub-hosted runners failed to start. Additionally, Pages deployments for sites without a custom Actions workflow (93% of them) did not run, preventing new changes from being deployed. An unexpected data shape led to crashes in some of our pods. We mitigated the incident by excluding the affected pods and correcting the data that led to the crashes. We’ve fixed the source of the unexpected data shape and have improved the overall resilience of our service against such occurrences.

update Mar 8, 2025, 06:10 PM

Actions run start delays are mitigated. Actions runs that failed will need to be re-run. Impacted Pages updates will need to re-run their deployments.

update Mar 8, 2025, 06:00 PM

Pages is operating normally.

update Mar 8, 2025, 05:50 PM

We are investigating impact to Actions run start delays, about 40% of runs are not starting within five minutes and Pages deployments are impacted for GitHub hosted runners.

investigating Mar 8, 2025, 05:45 PM

We are investigating reports of degraded performance for Actions and Pages

Disruption with some GitHub services

minor resolved

Created: Mar 7, 2025, 10:03 AM

Resolved: Mar 7, 2025, 11:24 AM

Duration: 1h 21m

Weighted Downtime: 20.25m

Affected Components: IssuesPull Requestsand Actions

7 updates

resolved Mar 7, 2025, 11:24 AM

On March 7, 2025, from 09:30 UTC to 11:07 UTC, we experienced a networking event that disrupted connectivity to our search infrastructure, impacting about 25% of search queries and indexing attempts. Searches for PRs, Issues, Actions workflow runs, Packages, Releases, and other products were impacted, resulting in failed requests or stale data. The connectivity issue self-resolved after 90 minutes. The backlog of indexing jobs was fully processed and saw recovery soon after, and queries to all indexes also saw an immediate return to normal throughput.We are working with our cloud provider to identify the root cause and are researching additional layers of redundancy to reduce customer impact in the future for issues like this one. We are also exploring mitigation strategies for faster resolution.

update Mar 7, 2025, 10:54 AM

We continue investigating degraded experience with searching for issues, pull, requests and actions workflow runs.

update Mar 7, 2025, 10:27 AM

Actions is experiencing degraded performance. We are continuing to investigate.

update Mar 7, 2025, 10:12 AM

Searches for issues and pull-requests may be slower than normal and timeout for some users

update Mar 7, 2025, 10:06 AM

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update Mar 7, 2025, 10:05 AM

Issues is experiencing degraded performance. We are continuing to investigate.

investigating Mar 7, 2025, 10:03 AM

We are currently investigating this issue.

Incident with Issues, Git Operations and API Requests

minor resolved

Created: Mar 3, 2025, 04:20 AM

Resolved: Mar 3, 2025, 05:31 AM

Duration: 1h 11m

Weighted Downtime: 17.75m

Affected Components: Git OperationsWebhooksAPI Requestsand Issues

8 updates

resolved Mar 3, 2025, 05:31 AM

On March 3rd 2025 between 04:07 UTC and 09:36 UTC various GitHub services were degraded with an average error rate of 0.03% and peak error rate of 9%. This issue impacted web requests, API requests, and git operations. This incident was triggered because a network node in one of GitHub's datacenter sites partially failed, resulting in silent packet drops for traffic served by that site. At 09:22 UTC, we identified the failing network node, and at 09:36 UTC we addressed the issue by removing the faulty network node from production.In response to this incident, we are improving our monitoring capabilities to identify and respond to similar silent errors more effectively in the future.

update Mar 3, 2025, 05:30 AM

We have seen recovery across our services and impact is mitigated.

update Mar 3, 2025, 05:20 AM

Webhooks is operating normally.

update Mar 3, 2025, 05:20 AM

Git Operations is operating normally.

update Mar 3, 2025, 04:54 AM

We are investigating intermittent connectivity issues between our backend and databases and will provide further updates as we have them. The current impact is you may see elevated latency while using our services.

update Mar 3, 2025, 04:23 AM

We are seeing intermittent timeouts across our various services. We are currently investigating and will provide updates.

update Mar 3, 2025, 04:21 AM

Webhooks is experiencing degraded performance. We are continuing to investigate.

investigating Mar 3, 2025, 04:20 AM

We are investigating reports of degraded performance for API Requests, Git Operations and Issues

Elevated Request Latency for Write operations on github.com and api.github.com

minor resolved

Created: Feb 28, 2025, 06:12 AM

Resolved: Feb 28, 2025, 06:55 AM

Duration: 43m

Weighted Downtime: 10.75m

Affected Components: Issues and Pull Requests

3 updates

resolved Feb 28, 2025, 06:55 AM

On February 28th, 2025, between 05:49 UTC and 06:55 UTC, a newly deployed background job caused increased load on GitHub’s primary database hosts, resulting in connection pool exhaustion. This led to degraded performance, manifesting as increased latency for write operations and elevated request timeout rates across multiple services.The incident was mitigated by halting execution of the problematic background job and disabling the feature flag controlling the job execution. To prevent similar incidents in the future, we are collaborating on a plan to improve our production signals to better detect and respond to query performance issues.

update Feb 28, 2025, 06:29 AM

Issues and Pull Requests are experiencing degraded performance. We are continuing to investigate.

investigating Feb 28, 2025, 06:12 AM

We are currently investigating this issue.

Disruption with some GitHub services

minor resolved

Created: Feb 27, 2025, 11:28 AM

Resolved: Feb 27, 2025, 12:22 PM

Duration: 54m

Weighted Downtime: 13.5m

Affected Components: Actions

7 updates

update Feb 27, 2025, 12:22 PM

The team is confident that recovery is complete. Thank you for your patience as this issue was investigated.

resolved Feb 27, 2025, 12:22 PM

On February 27, 2025, between 11:30 UTC and 12:22 UTC, Actions experienced degraded performance, leading to delays in workflow runs. On average, 5% of Actions workflow runs were delayed by 31 minutes. The delays were caused by updates in a dependent service that led to failures in Redis connectivity in one region. We mitigated the incident by failing over the impacted service and re-routing the service’s traffic out of that region. We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.

update Feb 27, 2025, 12:16 PM

Our mitigations have rolled out successfully and have seen recovery for all Actions run starts back within expected range. Users should see Actions runs working normally.We will keep this incident open for a short time while we continue to validate these results.

update Feb 27, 2025, 12:01 PM

We have identified the cause of the delays to starting Action runs.Our team is working to roll out mitigations and we hope to see recovery as these take effect in our systems over the next 10-20 minutes. Further updates as we have more information.

update Feb 27, 2025, 11:39 AM

We are seeing an increase in run start delays since 1104 UTC. This is impacting ~3% of Action runs at this time.The team is working to understand the causes of this and to mitigate impact. We will continue to update as we have more information.

update Feb 27, 2025, 11:31 AM

Actions is experiencing degraded performance. We are continuing to investigate.

investigating Feb 27, 2025, 11:28 AM

We are currently investigating this issue.

Incident with Actions and Packages

minor resolved

Created: Feb 26, 2025, 03:51 PM

Resolved: Feb 26, 2025, 05:19 PM

Duration: 1h 28m

Weighted Downtime: 22m

Affected Components: Actions and Packages

6 updates

update Feb 26, 2025, 05:19 PM

Actions and Packages are operating normally.

resolved Feb 26, 2025, 05:19 PM

On February 26, 2025, between 14:51 UTC and 17:19 UTC, GitHub Packages experienced a service degradation, leading to billing-related failures when uploading and downloading Packages. During this period, the billing usage and budget pages were also inaccessible. Initially, we reported that GitHub Actions was affected, but we later determined that the impact was limited to jobs interacting with Packages services, while jobs that did not upload or download Packages remained unaffected.The incident occurred due to an error in newly introduced code, which caused containers to get into a bad state, ultimately leading to billing API calls failing with 503 errors. We mitigated the issue by rolling back the contributing change. In response to this incident, we are enhancing error handling, improving the resiliency of our billing API calls to minimize customer impact, and improving change rollout practices to catch these potential issues prior to deployment.

update Feb 26, 2025, 04:41 PM

We're continuing our investigation into Billing interfaces and retrieval of packages causing Actions workflow run failures.

update Feb 26, 2025, 04:17 PM

We’re investigating issues related to billing and the retrieval of packages that are causing Actions workflow run failures.

update Feb 26, 2025, 03:56 PM

We're investigating issues related to the Billing interfaces and Packages downloads failing for enterprise customers.

investigating Feb 26, 2025, 03:51 PM

We are investigating reports of degraded performance for Actions and Packages

Disruption with some GitHub services

major resolved

Created: Feb 25, 2025, 03:12 PM

Resolved: Feb 25, 2025, 04:50 PM

Duration: 1h 38m

Weighted Downtime: 1h 13.5m

6 updates

resolved Feb 25, 2025, 04:50 PM

On February 25th, 2025, between 14:25 UTC and 16:44 UTC email and web notifications experienced delivery delays. At the peak of the incident the delay resulted in ~10% of all notifications taking over 10 minutes to be delivered, with the remaining ~90% being delivered within 5-10 minutes. This was due to insufficient capacity in worker pools as a result of increased load during peak hours.We also encountered delivery delays for a small number of webhooks, with delays of up-to 2.5 minutes to be delivered.We mitigated the incident by scaling out the service to meet the demand.The increase in capacity gives us extra headroom, and we are working to improve our capacity planning to prevent issues like this occurring in the future.

update Feb 25, 2025, 04:49 PM

Web and email notifications are caught up, resolving the incident.

update Feb 25, 2025, 04:16 PM

We're continuing to investigate delayed web and email notifications.

update Feb 25, 2025, 03:43 PM

We're continuing to investigate delayed web and email notifications.

update Feb 25, 2025, 03:13 PM

We're investigating delays in web and email notifications impacting all customers.

investigating Feb 25, 2025, 03:12 PM

We are currently investigating this issue.

Claude 3.7 Sonnet Partially Unavailable

minor resolved

Created: Feb 25, 2025, 02:40 PM

Resolved: Feb 25, 2025, 03:45 PM

Duration: 1h 5m

Weighted Downtime: 16.25m

Affected Components: Copilot

5 updates

resolved Feb 25, 2025, 03:45 PM

On February 25, 2025 between 13:40 UTC and 15:45 UTC the Claude 3.7 Sonnet model for GitHub Copilot Chat experienced degraded performance. During the impact, occasional requests to Claude would result in an immediate error to the user. This was due to upstream errors with one of our infrastructure providers, which have since been mitigated.We are working with our infrastructure providers to reduce time to detection and implement additional failover options, to mitigate issues like this one in the future.

update Feb 25, 2025, 03:25 PM

We have disabled Claude 3.7 Sonnet models in Copilot Chat and across IDE integrations (VSCode, Visual Studio, JetBrains) due to an issue with our provider.Users may still see these models as available for a brief period but we recommend switching to a different model. Other models were not impacted and are available.Once our provider has resolved the issues impacting Claude 3.7 Sonnet models, we will re-enable them.

update Feb 25, 2025, 02:44 PM

Copilot is experiencing degraded performance. We are continuing to investigate.

update Feb 25, 2025, 02:43 PM

We are currently experiencing partial availability for the Claude 3.7 Sonnet and Claude 3.7 Thinking models in Copilot Chat, VSCode and other Copilot products. This is due to problems with an upstream provider. We are working to resolve these issues and will update with more information as it is made available.Other Copilot models are available and working as expected.

investigating Feb 25, 2025, 02:40 PM

We are currently investigating this issue.

Incident with Packages

minor resolved

Created: Feb 25, 2025, 12:17 AM

Resolved: Feb 25, 2025, 01:08 AM

Duration: 51m

Weighted Downtime: 12.75m

Affected Components: Packages

4 updates

update Feb 25, 2025, 01:08 AM

We have confirmed recovery for the majority of our systems. Some systems may still experience higher than normal latency as they catch up.

resolved Feb 25, 2025, 01:08 AM

On February 25, 2025, between 00:17 UTC and 01:08 UTC, GitHub Packages experienced a service degradation, leading to failures uploading and downloading packages, along with increased latency for all requests to GitHub Packages registry. At peak impact, about 14% of uploads and downloads failed, and all Packages requests were delayed by an average of 7 seconds. The incident was caused by the rollout of a database configuration change that resulted in a degradation in database performance. We mitigated the incident by rolling back the contributing change and failing over the database. In response to this incident, we are tuning database configurations and resolving a source of deadlocks. We are also redistributing certain workloads to read replicas to reduce latency and enhance overall database performance.

update Feb 25, 2025, 12:41 AM

We have identified the issue impacting packages and have rolled out a fix. We are seeing signs of recovery and continue to monitor the situation.

investigating Feb 25, 2025, 12:17 AM

We are investigating reports of degraded performance for Packages

Claude 3.5 Sonnet model is unavailable in Copilot

minor resolved

Created: Feb 24, 2025, 10:06 PM

Resolved: Feb 24, 2025, 10:14 PM

Duration: 8m

Weighted Downtime: 2m

Affected Components: Copilot

4 updates

update Feb 24, 2025, 10:14 PM

We were able to quickly identify the problem and resolve this issue. Claude 3.5 Sonnet is available again.

resolved Feb 24, 2025, 10:14 PM

On February 24, 2025 between 21:42 UTC and 22:14 UTC the Claude 3.5 Sonnet model for GitHub Copilot Chat experienced degraded performance. During the impact, all requests to Claude 3.5 Sonnet would result in an immediate error to the user. This was due to misconfiguration within one of our infrastructure providers that has since been mitigated.We are working to prevent this error from occurring in the future by implementing additional failover options. Additionally we are updating our playbooks and alerting to reduce time to detection.

update Feb 24, 2025, 10:08 PM

At this time, we are unable to serve requests to the Claude 3.5 Sonnet on Copilot. No other models are affected. We are investigating the issue and will provide updates as we discovery more information.

investigating Feb 24, 2025, 10:06 PM

We are investigating reports of degraded performance for Copilot

Incident with Issues

minor resolved

Created: Feb 24, 2025, 04:08 PM

Resolved: Feb 24, 2025, 05:09 PM

Duration: 1h 1m

Weighted Downtime: 15.25m

Affected Components: Issues and Pull Requests

7 updates

update Feb 24, 2025, 05:09 PM

Issues is operating normally.

update Feb 24, 2025, 05:09 PM

Pull Requests is operating normally.

resolved Feb 24, 2025, 05:09 PM

On February 24, 2025, between 15:17 UTC and 17:08 UTC the GitHub Issues & Pull Requests services were degraded by showing stale results on search powered pages such as /issues and /pulls, meaning the displayed results may not have included the most recent updates. Additional features that depend on search functionality may have served stale results during this incident. There was no increase in latency for any of the services depending on search.We mitigated the incident by increasing the replica count for the workers that process background jobs related to search indexing. We are working on identifying the root cause to avoid similar incidents in the future.

update Feb 24, 2025, 04:48 PM

We continue to see recovery and expect Pull Requests and Issues search queries to recover within 30 minutes.

update Feb 24, 2025, 04:29 PM

We are seeing recovery and expect for Pull Requests and Issues search queries to recover within 15 minutes.

update Feb 24, 2025, 04:14 PM

Pull Requests is experiencing degraded performance. We are continuing to investigate.

investigating Feb 24, 2025, 04:08 PM

We are investigating reports of degraded performance for Issues

Disruption with some GitHub services

minor resolved

Created: Feb 24, 2025, 03:17 PM

Resolved: Feb 24, 2025, 06:31 PM

Duration: 3h 14m

Weighted Downtime: 48.5m

7 updates

resolved Feb 24, 2025, 06:31 PM

On February 21 2025 12:00 UTC - 2/24/2025, 18:31 UTC, the Copilot Metrics API failed to ingest daily metrics aggregations for all customers resulting in failure to populate new metrics from 2025-02-21 to 2025-02-24. This failure was triggered by the metrics ingestion process timing out when querying across the event dataset. The API was functional for retrieving historical metrics prior to 2025-02-21. On Monday morning 2/24/2025, 15:00 UTC, customer support was notified of the issue and the team deployed a fix to resolve query timeouts and ran backfills for the data from 2025-02-21 to 2025-02-23.We are working to prevent further outages by adding more alerting to timeouts and have further optimized all our queries to aggregate data.

update Feb 24, 2025, 06:25 PM

We have restored all of the data for 2025-02-21 to 2025-02-23. The data is queryable through the Copilot Metrics API. We are continuing to monitor the metrics data and expect to resolve the incident in the next hour.

update Feb 24, 2025, 05:50 PM

We expect the missing data from the weekend to be available within two hours.

update Feb 24, 2025, 04:28 PM

Copilot-metrics is in the process of restoring the usage statistics for 2025-02-23, we will continue to restore the previous 2 days over the next few hours.

update Feb 24, 2025, 03:56 PM

Customers may not be able to review their usage statistics for copilot starting Saturday through Monday morning UTC. The API is functioning normally, but no data is available for those time periods. We are working on backfilling the data and all metrics will be eventually available later today. We estimate recovery within in the next few hours and will provide updates on this as the recovery process proceeds.

update Feb 24, 2025, 03:19 PM

We are investigating reports of issues with service: Copilot metrics API. We will continue to keep users updated on progress towards mitigation.

investigating Feb 24, 2025, 03:17 PM

We are currently investigating this issue.

Disruption with some GitHub services

minor resolved

Created: Feb 16, 2025, 12:08 PM

Resolved: Feb 16, 2025, 12:44 PM

Duration: 36m

Weighted Downtime: 9m

Affected Components: Git OperationsWebhooksAPI RequestsIssuesPull RequestsActionsand Codespaces

11 updates

resolved Feb 16, 2025, 12:44 PM

On February 16th, 2025 from 11:30 UTC to 12:44 UTC, API requests to GitHub.com experienced increased latency and failures. Around 1% of API requests failed at the peak of this incident.This outage was caused by an experimental feature that malfunctioned and generated excessive database latency. In response to this incident, the feature has been redesigned to avoid database load which should prevent similar issues going forward.

update Feb 16, 2025, 12:43 PM

API Requests is operating normally.

update Feb 16, 2025, 12:43 PM

Webhooks is operating normally.

update Feb 16, 2025, 12:43 PM

Pull Requests is operating normally.

update Feb 16, 2025, 12:42 PM

Actions is operating normally.

update Feb 16, 2025, 12:42 PM

Git Operations is operating normally.

update Feb 16, 2025, 12:42 PM

Codespaces is operating normally.

update Feb 16, 2025, 12:42 PM

Issues is operating normally.

update Feb 16, 2025, 12:24 PM

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update Feb 16, 2025, 12:10 PM

API Requests is experiencing degraded performance. We are continuing to investigate.

investigating Feb 16, 2025, 12:08 PM

We are investigating reports of degraded performance for Actions, Codespaces, Git Operations, Issues and Webhooks

Disruption with some GitHub services

minor resolved

Created: Feb 14, 2025, 08:06 PM

Resolved: Feb 15, 2025, 04:15 AM

Duration: 8h 9m

Weighted Downtime: 2h 2.25m

Affected Components: Codespaces

8 updates

update Feb 15, 2025, 04:15 AM

We completed the rollout. GitHub Codespaces are healthy.

resolved Feb 15, 2025, 04:15 AM

On February 15, 2025, between 6:35 pm UTC and 4:15 am UTC the Codespaces service was degraded and users in various regions experienced intermittent connection failures. On average, the error rate was 50% and peaked at 65% of requests to the service. This was due to a service deployment.We mitigated the incident by completing the deployment to the impacted regions.The completion of this deployment should prevent future deployments of the service from negatively impacting Codespace connectivity.

update Feb 15, 2025, 03:21 AM

We continue the rollout in Central India, SE Asia, and Australia Codespaces regions. We are seeing a minimal number of connection failures across all regions at the moment.

update Feb 15, 2025, 01:47 AM

We rolled out a fix to most of our Codespaces regions. Central India, SE Asia, and Australia are the remaining regions to be fixed. Customers in these remaining regions can be experiencing issues with Codespaces connectivity.

update Feb 14, 2025, 08:53 PM

Some customers are continuing to see intermittent connection failures to their codespaces. We are monitoring closely to build a better idea of when impact should be mitigated. At this time, we expect the number of impacted users to remain low, and will update again when there is a development in our repair efforts.

update Feb 14, 2025, 08:22 PM

Codespaces is experiencing degraded performance. We are continuing to investigate.

update Feb 14, 2025, 08:12 PM

Some GitHub codespace users are experiencing intermittent connection failures. A deployment is underway to mitigate the problem, and US-based customers should see recovery soon. Full recovery is expected to take several hours. In the meantime, we advise customers experiencing issues to retry their connection attempts.

investigating Feb 14, 2025, 08:06 PM

We are currently investigating this issue.

[Retroactive] Incident with Migrations service

minor resolved

Created: Feb 13, 2025, 07:30 AM

Resolved: Feb 13, 2025, 07:30 AM

Duration: 0m

Weighted Downtime: 0m

1 update

resolved Feb 13, 2025, 07:30 AM

Between Thursday 13th, 2025 19:30 UTC and Friday 14th, 2025 08:02 UTC the Migrations service was experiencing intermittent migration failures for some customers. This was caused by a code change that contained an edge case that erroneously failed some migrations.We mitigated the incident by rolling back the code change.We are working on improving our monitoring and deployment practices to reduce our time to detection and mitigation of issues like this one in the future.

Claude Sonnet unavailable in GitHub Copilot

minor resolved

Created: Feb 12, 2025, 09:51 PM

Resolved: Feb 12, 2025, 11:10 PM

Duration: 1h 19m

Weighted Downtime: 19.75m

Affected Components: Copilot

8 updates

update Feb 12, 2025, 11:10 PM

Claude Sonnet is fully available in GitHub Copilot again. If you used an alternate model during the outage, you can switch back to Claude Sonnet.

resolved Feb 12, 2025, 11:10 PM

On February 12th, 2025, between 21:30 UTC and 23:10 UTC the Copilot service was degraded and all requests to Claude 3.5 Sonnet were failing. No other models were impacted. This was due to an issue with our upstream provider which was detected within 12 minutes, at which point we raised the issue to our provider to remediate. GitHub is working with our provider to improve the resiliency of the service.

update Feb 12, 2025, 11:04 PM

We are seeing a recovery with our Claude Sonnet model provider. We'll confirm once the problem is fully resolved.

update Feb 12, 2025, 10:54 PM

Our Claude Sonnet provider acknowledged the issue. They will provide us with next update by 11:30 AM UTC / 3:30 PM PT. Claude Sonnet remains unavailable in GitHub Copilot, please use an alternate model.

update Feb 12, 2025, 10:41 PM

We escalated the issue to our Claude Sonnet model provider. Claude Sonnet remains unavailable in GitHub Copilot, please use an alternate model.

update Feb 12, 2025, 09:59 PM

Claude Sonnet is currently not working in GitHub Copilot. Please switch to an alternate model while we're working on resolving the issue.

update Feb 12, 2025, 09:52 PM

Copilot is experiencing degraded performance. We are continuing to investigate.

investigating Feb 12, 2025, 09:51 PM

We are currently investigating this issue.

Incident with GIT LFS and Other Requests

minor resolved

Created: Feb 6, 2025, 09:42 AM

Resolved: Feb 6, 2025, 11:13 AM

Duration: 1h 31m

Weighted Downtime: 22.75m

Affected Components: API Requests

6 updates

update Feb 6, 2025, 11:13 AM

This issue has been mitigated. We will continue to investigate root causes to ensure this does not reoccur.

resolved Feb 6, 2025, 11:13 AM

On February 6, 2025, between 8:40AM UTC and 11:13AM UTC the GitHub REST API was degraded following the rollout of a new feature. The feature resulted in an increase in requests that saturated a cache and led to cascading failures in unrelated services. The error rate peaked at 100% of requests to the service.The incident was mitigated by increasing the allocated memory to the cache and rolling back the feature that led to the cache saturation. To prevent future incidents, we are working to reduce the time to detect a similar issue and optimize the overall calls to the cache.

update Feb 6, 2025, 11:05 AM

We have scaled out database resources and rolled back recent changes and are seeing signs of mitigation, but are monitoring to ensure complete recovery.

update Feb 6, 2025, 10:29 AM

We are attempting to scale databases to handle observed load spikes, as well as investigating other mitigation approaches.Customers may intermittently experience failures to fetch repositories with LFS, as well as increased latency and errors across the API.

update Feb 6, 2025, 09:52 AM

We are investigating failed Git LFS requests and potentially slow API requests.Customers may experience failures to fetch repositories with LFS.

investigating Feb 6, 2025, 09:42 AM

We are investigating reports of degraded performance for API Requests

Actions Larger Runners Provisioning Delays

minor resolved

Created: Feb 5, 2025, 08:58 AM

Resolved: Feb 5, 2025, 11:44 AM

Duration: 2h 46m

Weighted Downtime: 41.5m

6 updates

resolved Feb 5, 2025, 11:44 AM

Between Feb 5, 2025 00:34 UTC and 11:16 UTC, up to 7% of organizations using GitHub-hosted larger runners with public IP addresses had those jobs fail to start during the impact window. The issue was caused by a backend migration in the public IP management system, which caused certain public IP address runners to be placed in a non-functioning state.We have improved the rollback steps for this migration to reduce the time to mitigate any future recurrences, are working to improve automated detection of this error state, and are improving the resiliency of runners to handle this error state without customer impact.

update Feb 5, 2025, 11:17 AM

We have identified a configuration change that we believe may be related. We are working to mitigate.

update Feb 5, 2025, 10:33 AM

We are continuing investigation

update Feb 5, 2025, 09:56 AM

We continue to investigate and have determined this is limited to a subset of larger runner pools.

update Feb 5, 2025, 09:21 AM

We are investigating an incident where Actions larger runners are stuck in provisioning for some customers

investigating Feb 5, 2025, 08:58 AM

We are currently investigating this issue.

[Retroactive] Incident with some GitHub services

major resolved

Created: Feb 3, 2025, 07:37 PM

Resolved: Feb 3, 2025, 07:37 PM

Duration: 0m

Weighted Downtime: 0m

1 update

resolved Feb 3, 2025, 07:37 PM

A component that imports external git repositories into GitHub had an incident that was caused by the improper internal configuration of a gem. We have since rolled back to a stable version, and all migrations are able to resume.

Incident with Pull Requests and Issues

major resolved

Created: Jan 30, 2025, 02:29 PM

Resolved: Jan 30, 2025, 03:39 PM

Duration: 1h 10m

Weighted Downtime: 52.5m

Affected Components: Issues and Pull Requests

6 updates

update Jan 30, 2025, 03:39 PM

We have completed the fail over. Services are operating as normal.

resolved Jan 30, 2025, 03:39 PM

On January 30th, 2025 from 14:22 UTC to 14:48 UTC, web requests to GitHub.com experienced failures (at peak the error rate was 44%), with the average successful request taking over 3 seconds to complete.This outage was caused by a hardware failure in the caching layer that supports rate limiting. In addition, the impact was prolonged due to a lack of automated failover for the caching layer. A manual failover of the primary to trusted hardware was performed following recovery to ensure that the issue would not reoccur under similar circumstances.As a result of this incident, we will be moving to a high availability cache configuration and adding resilience to cache failures at this layer to ensure requests are able to be handled should similar circumstances happen in the future.

update Jan 30, 2025, 03:29 PM

We will be failing over one of our primary caching hosts to complete our mitigation of the problem. Users will experience some temporary service disruptions until that event is complete.

update Jan 30, 2025, 02:58 PM

We are seeing recovery in our caching infrastructure. We are continuing to monitor

update Jan 30, 2025, 02:46 PM

Users may experience timeouts in various GitHub services. We have identified an issue with our caching infrastructure and are working to mitigate the issue

investigating Jan 30, 2025, 02:29 PM

We are investigating reports of degraded availability for Issues and Pull Requests

Disruption with some GitHub services

minor resolved

Created: Jan 29, 2025, 02:52 PM

Resolved: Jan 29, 2025, 04:30 PM

Duration: 1h 38m

Weighted Downtime: 24.5m

6 updates

resolved Jan 29, 2025, 04:30 PM

On 29 January 2025 between 14:00 UTC and 16:28 UTC Copilot chat in github.com was degraded, where chat messages which included chat skills failed to save to our datastore due to a change in client side generated identifiers.We mitigated the incident by rolling back the client side changes. Based on this incident, we are working on better monitoring to reduce our detection time, fixing gaps in testing to prevent a repeat of incidents such as this one in the future.

update Jan 29, 2025, 04:29 PM

We have pushed a fix and are seeing general recovery.

update Jan 29, 2025, 04:09 PM

We're continuing to investigate an issue related to Copilot Chat on GitHub.com

update Jan 29, 2025, 03:37 PM

We're continuing to investigate an issue related to Copilot Chat on GitHub.com

update Jan 29, 2025, 03:04 PM

We're seeing issues related to Copilot chat on GitHub.com

investigating Jan 29, 2025, 02:52 PM

We are currently investigating this issue.

Disruption with some GitHub services

minor resolved

Created: Jan 27, 2025, 11:32 PM

Resolved: Jan 27, 2025, 11:41 PM

Duration: 9m

Weighted Downtime: 2.25m

3 updates

resolved Jan 27, 2025, 11:41 PM

On January 27th, 2025, between 23:32:00 UTC and 23:41:00 UTC the Audit Log Streaming service experienced an approximate 9 minute delay of Audit Log Events. Our systems maintained data continuity and we experienced no data loss. There was no impact to the Audit Log API or the Audit Log user interface. Any configured Audit Log Streaming endpoints received all relevant Audit Log Events (but they were delayed) and normal service was restored after the incident's resolution.

investigating Jan 27, 2025, 11:32 PM

We are currently investigating this issue.

update Jan 27, 2025, 11:32 PM

Our Audit Log Streaming service is experiencing degradation but is experiencing no data outage.

Incident With Migration Service

none resolved

Created: Jan 26, 2025, 09:00 PM

Resolved: Jan 26, 2025, 09:00 PM

Duration: 0m

Weighted Downtime: 0m

1 update

resolved Jan 26, 2025, 09:00 PM

Between Sunday 20:50 UTC and Monday 15:20 UTC the Migrations service was unable to process migrations. This was due to a invalid infrastructure credential. We mitigated the issue by updating the credential internally.Mechanisms and automation will be implemented to detect and prevent this issue again in the future.

Incident with Actions

minor resolved

Created: Jan 23, 2025, 10:25 AM

Resolved: Jan 23, 2025, 05:27 PM

Duration: 7h 2m

Weighted Downtime: 1h 45.5m

Affected Components: Actions

13 updates

resolved Jan 23, 2025, 05:27 PM

On January 23, 2025, between 9:49 and 17:00 UTC, the available capacity of large hosted runners was degraded. On average, 26% of jobs requiring large runners had a >5min delay getting a runner assigned. This was caused by the rollback of a configuration change and a latent bug in event processing, which was triggered by the mixed data shape that resulted from the rollback. The processing would reprocess the same events unnecessarily and cause the background job that manages large runner creation and deletion to run out of resources. It would automatically restart and continue processing, but the jobs were not able to keep up with production traffic. We mitigated the impact by using a feature flag to bypass the problematic event processing logic. While these changes had been rolling out in stages over the last few months and had been safely rolled back previously, an unrelated change prevented rollback from causing this problem in earlier stages.We are reviewing and updating the feature flags in this event processing workflow to ensure that we have high confidence in rollback in all rollout stages. We are also improving observability of the event processing to reduce the time to diagnose and mitigate similar issues going forward.

update Jan 23, 2025, 05:03 PM

We are seeing recovery with the latest mitigation. Queue time for a very small percentage of larger runner jobs are still longer than expected so we are monitoring those for full recovery before going green.

update Jan 23, 2025, 04:25 PM

We are actively applying mitigations to help improve larger runner start times. We are currently seeing delays starting about 25% of larger runner jobs.

update Jan 23, 2025, 03:33 PM

We are still actively investigating a slowdown in larger runner assignment and are working to apply additional mitigations.

update Jan 23, 2025, 02:53 PM

We're still applying mitigations to unblock queueing Actions in large runners. We are monitoring for full recovery.

update Jan 23, 2025, 02:17 PM

We are applying further mitigations to fix the issues with delayed queuing for Actions jobs in large runners. We continue to monitor for full recovery.

update Jan 23, 2025, 01:42 PM

We are investigating further mitigations for queueing Actions jobs in large runners. We continue to watch telemetry and are monitoring for full recovery.

update Jan 23, 2025, 01:09 PM

We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.

update Jan 23, 2025, 12:36 PM

The team continues to apply mitigations for issues with some Actions jobs delayed being enqueued for larger runners seen by a small number of customers. We will continue providing updates on the progress towards full mitigation.

update Jan 23, 2025, 12:03 PM

The team continues to apply mitigations for issues with some Actions jobs delayed being enqueued for larger runners. We will continue providing updates on the progress towards full mitigation.

update Jan 23, 2025, 11:31 AM

The team continues to investigate issues with some Actions jobs delayed being enqueued for larger runners. We will continue providing updates on the progress towards mitigation.

update Jan 23, 2025, 10:58 AM

The team continues to investigate issues with some Actions jobs having delays in being queued for larger runners. We will continue providing updates on the progress towards mitigation.

investigating Jan 23, 2025, 10:25 AM

We are investigating reports of degraded performance for Actions

Incident with Pull Request Rebase Merges

minor resolved

Created: Jan 16, 2025, 06:22 AM

Resolved: Jan 16, 2025, 09:40 AM

Duration: 3h 18m

Weighted Downtime: 49.5m

Affected Components: Pull Requests

7 updates

resolved Jan 16, 2025, 09:40 AM

On January 16, 2025, between 00:45 UTC and 09:40 UTC the Pull Requests service was degraded and failed to generate rebase merge commits. This was due to a configuration change that introduced disagreements between replicas. These disagreements caused a secondary job to run, triggering timeouts while computing rebase merge commits. We mitigated the incident by rolling back the configuration change.We are working on improving our monitoring and deployment practices to reduce our time to detection and mitigation of issues like this one in the future.

update Jan 16, 2025, 09:39 AM

The incident has been resolved, but please note affected pull requests will self repair when any commits are pushed to the pull requests' base branch or head branch. If you encounter problems with a rebase and merge, either click the "update branch" button or push a commit to the PR's branch.

update Jan 16, 2025, 09:18 AM

We have mitigated the incident, and any new pull request rebase merges should be recovered. We are working on recovery steps for any pull requests that attempted to merge during this incident.

update Jan 16, 2025, 08:37 AM

We believe to have found a root cause, and in the process of verifying the mitigation.

update Jan 16, 2025, 07:38 AM

We are still continuing to investigate.

update Jan 16, 2025, 07:05 AM

We are still experiencing failures for rebase merges in pull requests, we are continuing to investigate.

investigating Jan 16, 2025, 06:22 AM

We are investigating reports of degraded performance for Pull Requests

Disruption connecting to Codespaces

minor resolved

Created: Jan 14, 2025, 08:55 PM

Resolved: Jan 14, 2025, 09:20 PM

Duration: 25m

Weighted Downtime: 6.25m

Affected Components: Codespaces

4 updates

resolved Jan 14, 2025, 09:20 PM

On January 14, 2025, between 19:13 UTC and 21:210 UTC the Codespaces service was degraded and led to connection failures with running codespaces, with a 7.6% failure rate for connections during the degradation. Users with bad connections could not use impacted codespaces until they were stopped and restarted.This was caused by bad connections left behind after a deployment in an upstream dependency that the Codespaces service still provided to clients. The incident self-mitigated as new connections replaced stale ones. We are coordinating to ensure connection stability with future deployments of this nature.

update Jan 14, 2025, 09:19 PM

We are beginning to see recovery for users connecting to Codespaces. Any users continuing to see impact should attempt a restart.

investigating Jan 14, 2025, 08:55 PM

We are investigating reports of degraded performance for Codespaces

update Jan 14, 2025, 08:55 PM

We are investigating reports of timeouts for Codespaces users creating new or connecting to existing Codespaces. We will continue to keep users updated on progress towards mitigation.

Incident with Git Operations

major resolved

Created: Jan 13, 2025, 11:44 PM

Resolved: Jan 14, 2025, 12:28 AM

Duration: 44m

Weighted Downtime: 33m

Affected Components: Git OperationsActionsand Pages

5 updates

resolved Jan 14, 2025, 12:28 AM

On January 13, 2025, between 23:35 UTC and 00:24 UTC all Git operations were unavailable due to a configuration change causing our internal load balancer to drop requests between services that Git relies upon.We mitigated the incident by rolling back the configuration change.We are improving our monitoring and deployment practices to reduce our time to detection and automated mitigation for issues like this in the future.

update Jan 14, 2025, 12:15 AM

We've identified a cause of degraded git operations, which may affect other GitHub services that rely upon git. We're working to remediate.

update Jan 13, 2025, 11:57 PM

Actions is experiencing degraded performance. We are continuing to investigate.

update Jan 13, 2025, 11:46 PM

Pages is experiencing degraded performance. We are continuing to investigate.

investigating Jan 13, 2025, 11:44 PM

We are investigating reports of degraded availability for Git Operations

Issues with VNet Injected Larger Hosted Runners in East US 2

minor resolved

Created: Jan 9, 2025, 05:12 PM

Resolved: Jan 9, 2025, 08:00 PM

Duration: 2h 48m

Weighted Downtime: 42m

7 updates

update Jan 9, 2025, 08:00 PM

The impact to Large Runners has been mitigated. The third party incident has not been fully mitigated but is being actively monitored at https://azure.status.microsoft/en-us/status in case of reoccurrence.

resolved Jan 9, 2025, 08:00 PM

On January 9, 2025, larger hosted runners configured with Azure private networking in East US 2 were degraded, causing delayed job starts for ~2,300 jobs between 16:00 and 20:00 UTC. There was also an earlier period of impact from 2025-01-08 22:00 UTC to 2025-01-09 4:10 UTC with 488 jobs impacted. The cause of both these delays was an incident in East US 2 impacting provisioning and network connectivity of Azure resources. More details on that incident are visible at https://azure.status.microsoft/en-us/status/history (Tracking ID: PLP3-1W8). Because these runners are reliant on private networking with networks in the East US 2 region, there were no immediate mitigations available other than restoring network connectivity. Going forward, we will continue evaluating options to provide better resilience to 3rd party regional outages that affect private networking customers.

update Jan 9, 2025, 07:27 PM

We are continuing to see improvements while still monitoring updates from the third party at https://azure.status.microsoft/en-us/status

update Jan 9, 2025, 06:53 PM

We are still monitoring the third party networking updates via https://azure.status.microsoft/en-us/status. Multiple workstreams are in progress by the third party to mitigate the impact.

update Jan 9, 2025, 06:18 PM

We are still monitoring the third party networking updates via https://azure.status.microsoft/en-us/status. Multiple workstreams are in progress by the third party to mitigate the impact.

update Jan 9, 2025, 05:43 PM

The underlying third party networking issues have been identified and are being work on. Ongoing updates can be found at https://azure.status.microsoft/en-us/status

investigating Jan 9, 2025, 05:12 PM

We are currently investigating this issue.

Some GitHub Actions may not run

minor resolved

Created: Jan 9, 2025, 07:15 AM

Resolved: Jan 9, 2025, 08:30 AM

Duration: 1h 15m

Weighted Downtime: 18.75m

Affected Components: Actions

6 updates

update Jan 9, 2025, 08:30 AM

Actions is operating normally.

resolved Jan 9, 2025, 08:30 AM

On January 9, 2025, between 06:26 and 07:49 UTC, Actions experienced degraded performance, leading to failures in about 1% of workflow runs across ~10k repositories. The failures occurred due to an outage in a dependent service, which disrupted Redis connectivity in the East US 2 region. We mitigated the incident by re-routing Redis traffic out of that region at 07:49 UTC. We continued to monitor service recovery before resolving the incident at 08:30 UTC. We are working to improve our monitoring to reduce our time to detection and mitigation of issues like this one in the future.

update Jan 9, 2025, 08:17 AM

We have seen recovery of Actions runs for affected repositories. We are verifying all remediations before resolving this incident.

update Jan 9, 2025, 07:47 AM

We have identified the problem and are proceeding with a fail-over remediation. We anticipate this will allow Actions Runs to proceed for affected repositories.

update Jan 9, 2025, 07:17 AM

1-2% of repositories may have Actions jobs that are blocked and are not running or will be delayed. We have identified a potential cause. We are confirming and will be working on remediation.

investigating Jan 9, 2025, 07:15 AM

We are investigating reports of degraded performance for Actions

Incident with Webhooks

major resolved

Created: Jan 9, 2025, 01:36 AM

Resolved: Jan 9, 2025, 02:27 AM

Duration: 51m

Weighted Downtime: 38.25m

Affected Components: Git OperationsWebhooksAPI RequestsIssuesPull RequestsActionsPagesCodespacesand Copilot

23 updates

resolved Jan 9, 2025, 02:27 AM

On January 9, 2025, between 01:26 UTC and 01:56 UTC GitHub experienced widespread disruption to many services, with users receiving 500 responses when trying to access various functionality. This was due to a deployment which introduced a query that saturated a primary database server. On average, the error rate was 6% and peaked at 6.85% of update requests.We mitigated the incident by identifying the source of the problematic query and rolling back the deployment.We are investigating methods to detect problematic queries prior to deployment to prevent, and to reduce our time to detection and mitigation of issues like this one in the future.

update Jan 9, 2025, 02:19 AM

We have identified the root cause and have deployed a fix. Majority of the services have recovered. Actions service is in the process of being recovered.

update Jan 9, 2025, 02:14 AM

Copilot is operating normally.

update Jan 9, 2025, 02:13 AM

Pages is operating normally.

update Jan 9, 2025, 02:13 AM

Issues is operating normally.

update Jan 9, 2025, 02:12 AM

Pull Requests is operating normally.

update Jan 9, 2025, 02:12 AM

Webhooks is operating normally.

update Jan 9, 2025, 02:12 AM

Git Operations is operating normally.

update Jan 9, 2025, 02:11 AM

Codespaces is operating normally.

update Jan 9, 2025, 02:09 AM

We have identified the root cause and have deployed a fix. Service are recovering.

update Jan 9, 2025, 02:01 AM

API Requests is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:59 AM

We are continuing the investigation of multiple service issues. We will continue to keep users updated on progress towards mitigation.

update Jan 9, 2025, 01:53 AM

Copilot is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:51 AM

Codespaces is experiencing degraded availability. We are continuing to investigate.

update Jan 9, 2025, 01:51 AM

Codespaces is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:49 AM

Git Operations is experiencing degraded availability. We are continuing to investigate.

update Jan 9, 2025, 01:46 AM

We are investigating reports of issues with multiple services including authentication, PRs, codespaces, pages, git operation, and apis. We will continue to keep users updated on progress towards mitigation.

update Jan 9, 2025, 01:44 AM

Pages is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:43 AM

Git Operations is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:42 AM

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:41 AM

Issues is experiencing degraded performance. We are continuing to investigate.

update Jan 9, 2025, 01:37 AM

Actions is experiencing degraded performance. We are continuing to investigate.

investigating Jan 9, 2025, 01:36 AM

We are investigating reports of degraded availability for Webhooks

Incident with Actions resulting in degraded performance

minor resolved

Created: Jan 7, 2025, 02:49 PM

Resolved: Jan 7, 2025, 04:39 PM

Duration: 1h 50m

Weighted Downtime: 27.5m

Affected Components: WebhooksIssuesand Actions

9 updates

update Jan 7, 2025, 04:39 PM

Issues is operating normally.

resolved Jan 7, 2025, 04:39 PM

On January 7th, 2025 between 11:54:00 and 16:39 UTC, degraded performance was observed in Actions, Webhooks, and Issues, caused by an internal Certificate Authority configuration change that disrupted our event infrastructure. The configuration issue was promptly identified and resolved by rolling the change back on impacted hosts and re-issuing certificates.We have identified what services need updates to support the current PKI architecture and are working on implementing those changes to prevent a future recurrence.

update Jan 7, 2025, 04:38 PM

Webhooks is operating normally.

update Jan 7, 2025, 04:38 PM

Actions is operating normally.

update Jan 7, 2025, 04:09 PM

Webhooks is experiencing degraded performance. We are continuing to investigate.

update Jan 7, 2025, 03:59 PM

We have identified a configuration issue that we believe is the source of the Action workflow job delays and page latency increases. We are continuing to investigate and mitigate the issue.

update Jan 7, 2025, 03:51 PM

Issues is experiencing degraded performance. We are continuing to investigate.

update Jan 7, 2025, 03:17 PM

Users may see delays with Action workflow jobs in the UI and API responses. Additionally, several endpoints, including some Pull Request pages are experiencing increased latency. We are continuing to investigate the issue.

investigating Jan 7, 2025, 02:49 PM

We are investigating reports of degraded performance for Actions

Incident with Actions

minor resolved

Created: Jan 2, 2025, 10:09 PM

Resolved: Jan 3, 2025, 12:19 AM

Duration: 2h 10m

Weighted Downtime: 32.5m

Affected Components: Pull Requests and Actions

6 updates

update Jan 3, 2025, 12:19 AM

All systems are operational, and we have a plan to backfill the missing metadata. In total, 139,000 PRs were impacted across 45,000 repositories. The backfilled metadata will be available in a few days.Until the backfill is complete, there are several actions you can take to ensure an Action runs:- Any Actions that should have run on closed but not merged PRs can be triggered by re-opening and re-closing the PR.- Actions that should have run on PR merge can be re-run from the main branch of your repository.The only Actions that cannot be re-run at this time are ones that specifically use the merge commit.Additionally, the `merge_commit_sha` field on an impacted Pull Request will be `null` when queried via our API until the backfill completes.We appreciate the error reports we received, and thank you for your patience. We mitigated the initial impact quickly by rolling back a feature flag. We will be improving the monitoring of our feature flag rollouts in the future to better catch these scenarios.

resolved Jan 3, 2025, 12:19 AM

On January 2, 2025 between 16:00:00 and 22:27:30 UTC, a bug in feature-flagged code that cleans up Pull Requests after they are closed or merged incorrectly cleared the merge commit SHA for ~139,000 pull requests. During the incident, Actions workflows triggered by the on: pull_request trigger for the closed type were not queued successfully because of these missing merge commit SHAs. Approximately 45,000 repositories experienced these missing workflow triggers in either of two possible scenarios: pull requests which were closed, but not merged; and pull requests which were merged. Impact was mitigated after rolling back the aforementioned feature flag. Merged pull requests that were affected have had their merge commit SHAs restored. Closed pull requests have not had their merge commit SHA restored; however, customers can re-open and close them again to recalculate this SHA. We are investigating methods to improve detection of these kinds of errors in the future.

update Jan 2, 2025, 11:11 PM

We have remediated the issue impacting Actions workflows. During investigation and remediation, we realized there were also issues with recording metadata around merge commits. No git data or code has been lost. PRs merged today between 20:06 UTC and 22:15 UTC are impacted. We are working on a plan to regenerate the missing metadata and will provide an update once we have one in place.

update Jan 2, 2025, 11:05 PM

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update Jan 2, 2025, 10:30 PM

We have identified and begun to remediate the issue preventing Actions from triggering on closed pull requests. We are beginning to see recovery.

investigating Jan 2, 2025, 10:09 PM

We are investigating reports of degraded performance for Actions