2025-Q1
Jan 1, 2025 - Mar 31, 2025
Total Downtime
Total Incidents
Worst Component
Service Features
Time-based uptime calculation for the 129,600 minutes in this quarter
Downtime Definition: Minutes with >5% error rate (approximated from incident data)
| Component | Uptime % | Downtime | Incidents | Status | Service Credit |
|---|---|---|---|---|---|
| Git Operations | 99.9244% | 1h 38m | 4 | Pass | None |
| API Requests | 99.9323% | 1h 28m | 4 | Pass | None |
| Issues | 99.8017% | 4h 17m | 9 | Violation | 10% |
| Pull Requests | 99.8027% | 4h 16m | 9 | Violation | 10% |
| Webhooks | 99.9190% | 1h 45m | 5 | Pass | None |
| Pages | 99.9304% | 1h 30m | 4 | Pass | None |
Actions
Execution-based calculation (workflow success rate)
| Component | Uptime % | Downtime | Incidents |
|---|---|---|---|
| Actions | 99.6896% | 6h 42m | 14 |
Packages
Hybrid calculation with two separate metrics
1. Package Transfers: (Total transfers - Failed transfers) / Total transfers × 100
2. Package Storage: (Total minutes - Minutes with >5% error rate) / Total minutes × 100
| Component | Uptime % | Downtime | Incidents |
|---|---|---|---|
| Packages | 99.9732% | 35m | 2 |
Incidents in 2025-Q1
46 incidents occurred during this quarter
Disruption with some GitHub services
3 updates
Between March 29 7:00 UTC and March 31 17:00 UTC users were unable to unsubscribe from GitHub marketing email subscriptions due to a service outage. Additionally, on March 31, 2025 from 7:00 UTC to 16:40 UTC users were unable to submit eBook and event registration forms on resources.github.com, also due to a service outage. The incident occurred due to expired credentials used for an internal service. We mitigated it by renewing the credentials and redeploying the affected services. To improve future response times and prevent similar issues, we are enhancing our credential expiry detection, rotation processes, and on-call observability and alerting.
We are currently applying a mitigation to resolve an issue with managing marketing email subscriptions.
We are currently investigating this issue.
[Retroactive] Disruption with Pull Request Ref Updates
1 update
Beginning at 21:24 UTC on March 28 and lasting until 21:50 UTC, some customers of github.com had issues with PR tracking refs not being updated due to processing delays and increased failure rates. We did not status before we completed the rollback, and the incident is currently resolved. We are sorry for the delayed post on githubstatus.com.
Disruption with some GitHub services
2 updates
This incident was opened by mistake. Public services are currently functional.
We are currently investigating this issue.
Disruption with Pull Request Ref Updates
6 updates
This issue has been mitigated and we are operating normally.
Between March 27, 2025, 23:45 UTC and March 28, 2025, 01:40 UTC the Pull Requests service was degraded and failed to update refs for repositories with higher traffic activity. This was due to a large repository migration that resulted in a larger than usual number of enqueued jobs; while simultaneously impacting git fileservers where the problematic repository was hosted. This resulted in an increase in queue depth due to retries on failures to perform those jobs causing delays for non-migration sourced jobs.We declared an incident once we confirmed that this issue was not isolated to the problematic migration and other repositories were also failing to process ref updates. We mitigated the issue by stopping the migration and short circuiting the remaining jobs. Additionally, we increased the worker pool of this job to reduce the time required to recover. As a result of this incident, we are revisiting our repository migration process and are working to isolate potentially problematic migration workloads from non-migration workloads.
We are continuing to monitor for recovery.
We believe we have identified the source of the issue and are monitoring for recovery.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
We are currently investigating this issue.
1 update
Between 2024-03-23 18:10 UTC and 2024-03-24 16:10 UTC, migration jobs submitted through the GitHub UI experienced processing delays and increased failure rates. This issue only affected migrations initiated via the web interface. Migrations started through the API or the command line tool continued to function normally. We are sorry for the delayed post on githubstatus.com.
Disruption with some GitHub services
7 updates
Copilot is operating normally.
On March 21st, 2025, between 11:45 UTC and 13:20 UTC, users were unable to interact with GitHub Copilot Chat in GitHub. The issue was caused by a recently deployed Ruby change that unintentionally overwrote a global value. This led to GitHub Copilot Chat in GitHub being misconfigured with an invalid URL, preventing it from connecting to our chat server. Other Copilot clients were not affected.We mitigated the incident by identifying the source of the problematic query and rolling back the deployment.We are reviewing our deployment tooling to reduce the time to mitigate similar incidents in the future. In parallel, we are also improving our test coverage for this category of error to prevent them from being deployed to production.
Mitigation is complete and we are seeing full recovery for GitHub Copilot Chat in GitHub.
We have identified the problem and have a mitigation in progress.
Copilot is experiencing degraded performance. We are continuing to investigate.
We are investigating issues with GitHub Copilot Chat in GitHub. We will continue to keep users updated on progress toward mitigation.
We are currently investigating this issue.
Intermittent GitHub Actions workflow failures
7 updates
Actions is operating normally.
On March 21st, 2025, between 05:43 UTC and 08:49 UTC, the Actions service experienced degradation, leading to workflow run failures. During the incident, approximately 2.45% of workflow runs failed due to an infrastructure failure. This incident was caused by intermittent failures in communicating with an underlying service provider. We are working to improve our resilience to downtime in this service provider and to reduce the time to mitigate in any future recurrences.
We have made progress understanding the source of these errors and are working on a mitigation.
We're continuing to investigate elevated errors during GitHub Actions workflow runs. At this stage our monitoring indicates that these errors are impacting no more than 3% of all runs.
We're continuing to investigate intermittent failures with GitHub Actions workflow runs.
We're seeing errors reported with a subset of GitHub Actions workflow runs, and are continuing to investigate.
We are investigating reports of degraded performance for Actions
Incident with Codespaces
6 updates
We have seen full recovery in the last 15 minutes for Codespaces connections. GitHub Codespaces are healthy. For users who are still seeing connection problems, restarting the Codespace may help resolve the issue.
Codespaces is operating normally.
On March 21, 2025 between 01:00 UTC and 02:45 UTC, the Codespaces service was degraded and users in various regions experienced intermittent connection failures. The peak error error was 30% of connection attempts across 38% of Codespaces. This was due to a service deployment.The incident was mitigated by completing the deployment to the impacted regions. We are working with the service team to identify the cause of the connection losses and perform necessary repairs to avoid future occurrences.
We are continuing to investigate issues with failed connections to Codespaces. We are seeing recovery over the last 10 minutes.
Customers may be experiencing issues connecting to Codespaces on GitHub.com. We are currently investigating the underlying issue.
We are investigating reports of degraded performance for Codespaces
Incident with Pages
5 updates
On March 20, 2025, between 19:24 UTC and 20:42 UTC the GitHub Pages experience was degraded and returned 503s for some customers. We saw an error rate of roughly 2% for Pages views, and new page builds were unable to complete successfully before timing out. This was due to replication failure at the database layer between a write destination and read destination. We mitigated the incident by redirecting reads to the same destination as writes. The error with replication occurred while in this transitory phase, as we are in the process of migrating the underlying data for Pages to new database infrastructure. Additionally our monitors failed to detect the error.We are addressing the underlying cause of the failed replication and telemetry.
We have resolved the issue for Pages. If you're still experiencing issues with your GitHub Pages site, please rebuild.
Customers may not be able to create or make changes to their GitHub Pages sites. Customers who rely on webhook events from Pages builds might also experience a downgraded experience.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for Pages
Incident with Actions: Queue Run Failures
6 updates
The provider has reported full mitigation of the underlying issue, and Actions has been healthy since approximately 00:15 UTC.
Actions is operating normally.
On March 18th, 2025, between 23:20 UTC and March 19th, 2025 00:15 UTC, the Actions service experienced degradation, leading to run start delays. During the incident, about 0.3% of all workflow runs queued during the time failed to start, about 0.67% of all workflow runs were delayed by an average of 10 minutes, and about 0.16% of all workflow runs ultimately ended with an infrastructure failure. This was due to a networking issue with an underlying service provider. At 00:15 UTC the service provider mitigated their issue, and service was restored immediately for Actions. We are working to improve our resilience to downtime in this service provider to reduce the time to mitigate in any future recurrences.
We are continuing to investigate issues with delayed or failed workflow runs with Actions. We are engaged with a third-party provider who is also investigating issues and has confirmed we are impacted.
We are investigating reports of degraded performance for Actions
Some customers may be experiencing delays or failures when queueing workflow runs
Disruption with some GitHub services
6 updates
On March 18th, 2025, between 13:35 UTC and 17:45 UTC, some users of GitHub Copilot Chat in GitHub experienced intermittent failures when reading or writing messages in a thread, resulting in a degraded experience. The error rate peaked at 3% of requests to the service. This was due to an availability incident with a database provider. Around 16:15 UTC the upstream service provider mitigated their availability incident, and service was restored in the following hour.We are working to improve our failover strategy for this database to reduce the time to mitigate similar incidents in the future.
We are seeing recovery and no new errors for the last 15mins.
We are still investigating infrastructure issues and our provider has acknowledged the issues and is working on a mitigation. Customers might still see errors when creating messages, or new threads in Copilot Chat. Retries might be successful.
We are still investigating infrastructure issues and collaborating with providers. Customers might see some errors when creating messages, or new threads in Copilot Chat. Retries might be successful.
We are experiencing issues with our underlying data store which is causing a degraded experience for a small percentage of users using Copilot Chat in github.com
We are currently investigating this issue.
macos-15-arm64 hosted runner queue delays
5 updates
On March 18, between 13:04 and 16:55 UTC, Actions workflows relying on hosted runners using the beta MacOS 15 image experienced increased queue time waiting for available runners. An image update was pushed the previous day that included a performance reduction. The slower performance caused longer average runtimes, exhausting our available Mac capacity for this image. This was mitigated by rolling back the image update. We have updated our capacity allocation to the beta and other Mac images and are improving monitoring in our canary environments to catch this potential issue before it impacts customers.
We are seeing improvements in telemetry and are monitoring for full recovery.
We've applied a mitigation to fix the issues with queuing Actions jobs on macos-15-arm64 Hosted runner. We are monitoring.
The team continues to investigate issues with some Actions macos-15-arm64 Hosted jobs being queued for up to 15 minutes. We will continue providing updates on the progress towards mitigation.
We are currently investigating this issue.
Incident with Issues
6 updates
Between March 17, 2025, 18:05 UTC and March 18, 2025, 09:50 UTC, GitHub.com experienced intermittent failures in web and API requests. These issues affected a small percentage of users (mostly related to pull requests and issues), with a peak error rate of 0.165% across all requests.We identified a framework upgrade that caused kernel panics in our Kubernetes infrastructure as the root cause. We mitigated the incident by downgrading until we were able to disable a problematic feature. In response, we have investigated why the upgrade caused the unexpected issue, have taken steps to temporarily prevent it, and are working on longer term patch plans while improving our observability to ensure we can quickly react to similar classes of problems in the future.
We saw a spike in error rate with issues related pages and API requests due to some problems with restarts in our kubernetes infrastructure that, at peak, caused 0.165% of requests to see timeouts or errors related to these API surfaces over a 15 minute period. At this time we see minimal impact and are continuing to investigate the cause of the issue.
We are investigating reports of issues with service(s): Issues We're continuing to investigate. Users may see intermittent HTTP 500 responses when using Issues. Retrying the request may succeed.
We are investigating reports of issues with service(s): Issues We're continuing to investigate. We will continue to keep users updated on progress towards mitigation.
We are investigating reports of issues with service(s): Issues. We will continue to keep users updated on progress towards mitigation.
We are investigating reports of degraded performance for Issues
3 updates
On March 12, 2025, between 13:28 UTC and 14:07 UTC, the Actions service experienced degradation leading to run start delays. During the incident, about 0.6% of workflow runs failed to start, 0.8% of workflow runs were delayed by an average of one hour, and 0.1% of runs ultimately ended with an infrastructure failure. The issue stemmed from connectivity problems between the Actions services and certain nodes within one of our Redis clusters. The service began recovering once connectivity to the Redis cluster was restored at 13:41 UTC. These connectivity issues are typically not a concern because we can fail over to healthier replicas. However, due to an unrelated issue, there was a replication delay at the time of the incident, and failing over would have caused a greater impact on our customers. We are working on improving our resiliency and automation processes for this infrastructure to improve the speed of diagnosing and resolving similar issues in the future.
We have applied a mitigation for the affected Redis node, and are starting to see recovery with Action workflow executions.
We are investigating reports of degraded performance for Actions
Incident with Actions and Pages
6 updates
Actions is operating normally.
On March 8, 2025, between 17:16 UTC and 18:02 UTC, GitHub Actions and Pages services experienced degraded performance leading to delays in workflow runs and Pages deployments. During this time, 34% of Actions workflow runs experienced delays, and a small percentage of runs using GitHub-hosted runners failed to start. Additionally, Pages deployments for sites without a custom Actions workflow (93% of them) did not run, preventing new changes from being deployed. An unexpected data shape led to crashes in some of our pods. We mitigated the incident by excluding the affected pods and correcting the data that led to the crashes. We’ve fixed the source of the unexpected data shape and have improved the overall resilience of our service against such occurrences.
Actions run start delays are mitigated. Actions runs that failed will need to be re-run. Impacted Pages updates will need to re-run their deployments.
Pages is operating normally.
We are investigating impact to Actions run start delays, about 40% of runs are not starting within five minutes and Pages deployments are impacted for GitHub hosted runners.
We are investigating reports of degraded performance for Actions and Pages
Disruption with some GitHub services
7 updates
On March 7, 2025, from 09:30 UTC to 11:07 UTC, we experienced a networking event that disrupted connectivity to our search infrastructure, impacting about 25% of search queries and indexing attempts. Searches for PRs, Issues, Actions workflow runs, Packages, Releases, and other products were impacted, resulting in failed requests or stale data. The connectivity issue self-resolved after 90 minutes. The backlog of indexing jobs was fully processed and saw recovery soon after, and queries to all indexes also saw an immediate return to normal throughput.We are working with our cloud provider to identify the root cause and are researching additional layers of redundancy to reduce customer impact in the future for issues like this one. We are also exploring mitigation strategies for faster resolution.
We continue investigating degraded experience with searching for issues, pull, requests and actions workflow runs.
Actions is experiencing degraded performance. We are continuing to investigate.
Searches for issues and pull-requests may be slower than normal and timeout for some users
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Issues is experiencing degraded performance. We are continuing to investigate.
We are currently investigating this issue.
Incident with Issues, Git Operations and API Requests
8 updates
On March 3rd 2025 between 04:07 UTC and 09:36 UTC various GitHub services were degraded with an average error rate of 0.03% and peak error rate of 9%. This issue impacted web requests, API requests, and git operations. This incident was triggered because a network node in one of GitHub's datacenter sites partially failed, resulting in silent packet drops for traffic served by that site. At 09:22 UTC, we identified the failing network node, and at 09:36 UTC we addressed the issue by removing the faulty network node from production.In response to this incident, we are improving our monitoring capabilities to identify and respond to similar silent errors more effectively in the future.
We have seen recovery across our services and impact is mitigated.
Webhooks is operating normally.
Git Operations is operating normally.
We are investigating intermittent connectivity issues between our backend and databases and will provide further updates as we have them. The current impact is you may see elevated latency while using our services.
We are seeing intermittent timeouts across our various services. We are currently investigating and will provide updates.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for API Requests, Git Operations and Issues
3 updates
On February 28th, 2025, between 05:49 UTC and 06:55 UTC, a newly deployed background job caused increased load on GitHub’s primary database hosts, resulting in connection pool exhaustion. This led to degraded performance, manifesting as increased latency for write operations and elevated request timeout rates across multiple services.The incident was mitigated by halting execution of the problematic background job and disabling the feature flag controlling the job execution. To prevent similar incidents in the future, we are collaborating on a plan to improve our production signals to better detect and respond to query performance issues.
Issues and Pull Requests are experiencing degraded performance. We are continuing to investigate.
We are currently investigating this issue.
Disruption with some GitHub services
7 updates
The team is confident that recovery is complete. Thank you for your patience as this issue was investigated.
On February 27, 2025, between 11:30 UTC and 12:22 UTC, Actions experienced degraded performance, leading to delays in workflow runs. On average, 5% of Actions workflow runs were delayed by 31 minutes. The delays were caused by updates in a dependent service that led to failures in Redis connectivity in one region. We mitigated the incident by failing over the impacted service and re-routing the service’s traffic out of that region. We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.
Our mitigations have rolled out successfully and have seen recovery for all Actions run starts back within expected range. Users should see Actions runs working normally.We will keep this incident open for a short time while we continue to validate these results.
We have identified the cause of the delays to starting Action runs.Our team is working to roll out mitigations and we hope to see recovery as these take effect in our systems over the next 10-20 minutes. Further updates as we have more information.
We are seeing an increase in run start delays since 1104 UTC. This is impacting ~3% of Action runs at this time.The team is working to understand the causes of this and to mitigate impact. We will continue to update as we have more information.
Actions is experiencing degraded performance. We are continuing to investigate.
We are currently investigating this issue.
Incident with Actions and Packages
6 updates
Actions and Packages are operating normally.
On February 26, 2025, between 14:51 UTC and 17:19 UTC, GitHub Packages experienced a service degradation, leading to billing-related failures when uploading and downloading Packages. During this period, the billing usage and budget pages were also inaccessible. Initially, we reported that GitHub Actions was affected, but we later determined that the impact was limited to jobs interacting with Packages services, while jobs that did not upload or download Packages remained unaffected.The incident occurred due to an error in newly introduced code, which caused containers to get into a bad state, ultimately leading to billing API calls failing with 503 errors. We mitigated the issue by rolling back the contributing change. In response to this incident, we are enhancing error handling, improving the resiliency of our billing API calls to minimize customer impact, and improving change rollout practices to catch these potential issues prior to deployment.
We're continuing our investigation into Billing interfaces and retrieval of packages causing Actions workflow run failures.
We’re investigating issues related to billing and the retrieval of packages that are causing Actions workflow run failures.
We're investigating issues related to the Billing interfaces and Packages downloads failing for enterprise customers.
We are investigating reports of degraded performance for Actions and Packages
Disruption with some GitHub services
6 updates
On February 25th, 2025, between 14:25 UTC and 16:44 UTC email and web notifications experienced delivery delays. At the peak of the incident the delay resulted in ~10% of all notifications taking over 10 minutes to be delivered, with the remaining ~90% being delivered within 5-10 minutes. This was due to insufficient capacity in worker pools as a result of increased load during peak hours.We also encountered delivery delays for a small number of webhooks, with delays of up-to 2.5 minutes to be delivered.We mitigated the incident by scaling out the service to meet the demand.The increase in capacity gives us extra headroom, and we are working to improve our capacity planning to prevent issues like this occurring in the future.
Web and email notifications are caught up, resolving the incident.
We're continuing to investigate delayed web and email notifications.
We're continuing to investigate delayed web and email notifications.
We're investigating delays in web and email notifications impacting all customers.
We are currently investigating this issue.
Claude 3.7 Sonnet Partially Unavailable
5 updates
On February 25, 2025 between 13:40 UTC and 15:45 UTC the Claude 3.7 Sonnet model for GitHub Copilot Chat experienced degraded performance. During the impact, occasional requests to Claude would result in an immediate error to the user. This was due to upstream errors with one of our infrastructure providers, which have since been mitigated.We are working with our infrastructure providers to reduce time to detection and implement additional failover options, to mitigate issues like this one in the future.
We have disabled Claude 3.7 Sonnet models in Copilot Chat and across IDE integrations (VSCode, Visual Studio, JetBrains) due to an issue with our provider.Users may still see these models as available for a brief period but we recommend switching to a different model. Other models were not impacted and are available.Once our provider has resolved the issues impacting Claude 3.7 Sonnet models, we will re-enable them.
Copilot is experiencing degraded performance. We are continuing to investigate.
We are currently experiencing partial availability for the Claude 3.7 Sonnet and Claude 3.7 Thinking models in Copilot Chat, VSCode and other Copilot products. This is due to problems with an upstream provider. We are working to resolve these issues and will update with more information as it is made available.Other Copilot models are available and working as expected.
We are currently investigating this issue.
Incident with Packages
4 updates
We have confirmed recovery for the majority of our systems. Some systems may still experience higher than normal latency as they catch up.
On February 25, 2025, between 00:17 UTC and 01:08 UTC, GitHub Packages experienced a service degradation, leading to failures uploading and downloading packages, along with increased latency for all requests to GitHub Packages registry. At peak impact, about 14% of uploads and downloads failed, and all Packages requests were delayed by an average of 7 seconds. The incident was caused by the rollout of a database configuration change that resulted in a degradation in database performance. We mitigated the incident by rolling back the contributing change and failing over the database. In response to this incident, we are tuning database configurations and resolving a source of deadlocks. We are also redistributing certain workloads to read replicas to reduce latency and enhance overall database performance.
We have identified the issue impacting packages and have rolled out a fix. We are seeing signs of recovery and continue to monitor the situation.
We are investigating reports of degraded performance for Packages
Claude 3.5 Sonnet model is unavailable in Copilot
4 updates
We were able to quickly identify the problem and resolve this issue. Claude 3.5 Sonnet is available again.
On February 24, 2025 between 21:42 UTC and 22:14 UTC the Claude 3.5 Sonnet model for GitHub Copilot Chat experienced degraded performance. During the impact, all requests to Claude 3.5 Sonnet would result in an immediate error to the user. This was due to misconfiguration within one of our infrastructure providers that has since been mitigated.We are working to prevent this error from occurring in the future by implementing additional failover options. Additionally we are updating our playbooks and alerting to reduce time to detection.
At this time, we are unable to serve requests to the Claude 3.5 Sonnet on Copilot. No other models are affected. We are investigating the issue and will provide updates as we discovery more information.
We are investigating reports of degraded performance for Copilot
Incident with Issues
7 updates
Issues is operating normally.
Pull Requests is operating normally.
On February 24, 2025, between 15:17 UTC and 17:08 UTC the GitHub Issues & Pull Requests services were degraded by showing stale results on search powered pages such as /issues and /pulls, meaning the displayed results may not have included the most recent updates. Additional features that depend on search functionality may have served stale results during this incident. There was no increase in latency for any of the services depending on search.We mitigated the incident by increasing the replica count for the workers that process background jobs related to search indexing. We are working on identifying the root cause to avoid similar incidents in the future.
We continue to see recovery and expect Pull Requests and Issues search queries to recover within 30 minutes.
We are seeing recovery and expect for Pull Requests and Issues search queries to recover within 15 minutes.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for Issues
Disruption with some GitHub services
7 updates
On February 21 2025 12:00 UTC - 2/24/2025, 18:31 UTC, the Copilot Metrics API failed to ingest daily metrics aggregations for all customers resulting in failure to populate new metrics from 2025-02-21 to 2025-02-24. This failure was triggered by the metrics ingestion process timing out when querying across the event dataset. The API was functional for retrieving historical metrics prior to 2025-02-21. On Monday morning 2/24/2025, 15:00 UTC, customer support was notified of the issue and the team deployed a fix to resolve query timeouts and ran backfills for the data from 2025-02-21 to 2025-02-23.We are working to prevent further outages by adding more alerting to timeouts and have further optimized all our queries to aggregate data.
We have restored all of the data for 2025-02-21 to 2025-02-23. The data is queryable through the Copilot Metrics API. We are continuing to monitor the metrics data and expect to resolve the incident in the next hour.
We expect the missing data from the weekend to be available within two hours.
Copilot-metrics is in the process of restoring the usage statistics for 2025-02-23, we will continue to restore the previous 2 days over the next few hours.
Customers may not be able to review their usage statistics for copilot starting Saturday through Monday morning UTC. The API is functioning normally, but no data is available for those time periods. We are working on backfilling the data and all metrics will be eventually available later today. We estimate recovery within in the next few hours and will provide updates on this as the recovery process proceeds.
We are investigating reports of issues with service: Copilot metrics API. We will continue to keep users updated on progress towards mitigation.
We are currently investigating this issue.
Disruption with some GitHub services
11 updates
On February 16th, 2025 from 11:30 UTC to 12:44 UTC, API requests to GitHub.com experienced increased latency and failures. Around 1% of API requests failed at the peak of this incident.This outage was caused by an experimental feature that malfunctioned and generated excessive database latency. In response to this incident, the feature has been redesigned to avoid database load which should prevent similar issues going forward.
API Requests is operating normally.
Webhooks is operating normally.
Pull Requests is operating normally.
Actions is operating normally.
Git Operations is operating normally.
Codespaces is operating normally.
Issues is operating normally.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
API Requests is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for Actions, Codespaces, Git Operations, Issues and Webhooks
Disruption with some GitHub services
8 updates
We completed the rollout. GitHub Codespaces are healthy.
On February 15, 2025, between 6:35 pm UTC and 4:15 am UTC the Codespaces service was degraded and users in various regions experienced intermittent connection failures. On average, the error rate was 50% and peaked at 65% of requests to the service. This was due to a service deployment.We mitigated the incident by completing the deployment to the impacted regions.The completion of this deployment should prevent future deployments of the service from negatively impacting Codespace connectivity.
We continue the rollout in Central India, SE Asia, and Australia Codespaces regions. We are seeing a minimal number of connection failures across all regions at the moment.
We rolled out a fix to most of our Codespaces regions. Central India, SE Asia, and Australia are the remaining regions to be fixed. Customers in these remaining regions can be experiencing issues with Codespaces connectivity.
Some customers are continuing to see intermittent connection failures to their codespaces. We are monitoring closely to build a better idea of when impact should be mitigated. At this time, we expect the number of impacted users to remain low, and will update again when there is a development in our repair efforts.
Codespaces is experiencing degraded performance. We are continuing to investigate.
Some GitHub codespace users are experiencing intermittent connection failures. A deployment is underway to mitigate the problem, and US-based customers should see recovery soon. Full recovery is expected to take several hours. In the meantime, we advise customers experiencing issues to retry their connection attempts.
We are currently investigating this issue.
[Retroactive] Incident with Migrations service
1 update
Between Thursday 13th, 2025 19:30 UTC and Friday 14th, 2025 08:02 UTC the Migrations service was experiencing intermittent migration failures for some customers. This was caused by a code change that contained an edge case that erroneously failed some migrations.We mitigated the incident by rolling back the code change.We are working on improving our monitoring and deployment practices to reduce our time to detection and mitigation of issues like this one in the future.
Claude Sonnet unavailable in GitHub Copilot
8 updates
Claude Sonnet is fully available in GitHub Copilot again. If you used an alternate model during the outage, you can switch back to Claude Sonnet.
On February 12th, 2025, between 21:30 UTC and 23:10 UTC the Copilot service was degraded and all requests to Claude 3.5 Sonnet were failing. No other models were impacted. This was due to an issue with our upstream provider which was detected within 12 minutes, at which point we raised the issue to our provider to remediate. GitHub is working with our provider to improve the resiliency of the service.
We are seeing a recovery with our Claude Sonnet model provider. We'll confirm once the problem is fully resolved.
Our Claude Sonnet provider acknowledged the issue. They will provide us with next update by 11:30 AM UTC / 3:30 PM PT. Claude Sonnet remains unavailable in GitHub Copilot, please use an alternate model.
We escalated the issue to our Claude Sonnet model provider. Claude Sonnet remains unavailable in GitHub Copilot, please use an alternate model.
Claude Sonnet is currently not working in GitHub Copilot. Please switch to an alternate model while we're working on resolving the issue.
Copilot is experiencing degraded performance. We are continuing to investigate.
We are currently investigating this issue.
Incident with GIT LFS and Other Requests
6 updates
This issue has been mitigated. We will continue to investigate root causes to ensure this does not reoccur.
On February 6, 2025, between 8:40AM UTC and 11:13AM UTC the GitHub REST API was degraded following the rollout of a new feature. The feature resulted in an increase in requests that saturated a cache and led to cascading failures in unrelated services. The error rate peaked at 100% of requests to the service.The incident was mitigated by increasing the allocated memory to the cache and rolling back the feature that led to the cache saturation. To prevent future incidents, we are working to reduce the time to detect a similar issue and optimize the overall calls to the cache.
We have scaled out database resources and rolled back recent changes and are seeing signs of mitigation, but are monitoring to ensure complete recovery.
We are attempting to scale databases to handle observed load spikes, as well as investigating other mitigation approaches.Customers may intermittently experience failures to fetch repositories with LFS, as well as increased latency and errors across the API.
We are investigating failed Git LFS requests and potentially slow API requests.Customers may experience failures to fetch repositories with LFS.
We are investigating reports of degraded performance for API Requests
Actions Larger Runners Provisioning Delays
6 updates
Between Feb 5, 2025 00:34 UTC and 11:16 UTC, up to 7% of organizations using GitHub-hosted larger runners with public IP addresses had those jobs fail to start during the impact window. The issue was caused by a backend migration in the public IP management system, which caused certain public IP address runners to be placed in a non-functioning state.We have improved the rollback steps for this migration to reduce the time to mitigate any future recurrences, are working to improve automated detection of this error state, and are improving the resiliency of runners to handle this error state without customer impact.
We have identified a configuration change that we believe may be related. We are working to mitigate.
We are continuing investigation
We continue to investigate and have determined this is limited to a subset of larger runner pools.
We are investigating an incident where Actions larger runners are stuck in provisioning for some customers
We are currently investigating this issue.
[Retroactive] Incident with some GitHub services
1 update
A component that imports external git repositories into GitHub had an incident that was caused by the improper internal configuration of a gem. We have since rolled back to a stable version, and all migrations are able to resume.
Incident with Pull Requests and Issues
6 updates
We have completed the fail over. Services are operating as normal.
On January 30th, 2025 from 14:22 UTC to 14:48 UTC, web requests to GitHub.com experienced failures (at peak the error rate was 44%), with the average successful request taking over 3 seconds to complete.This outage was caused by a hardware failure in the caching layer that supports rate limiting. In addition, the impact was prolonged due to a lack of automated failover for the caching layer. A manual failover of the primary to trusted hardware was performed following recovery to ensure that the issue would not reoccur under similar circumstances.As a result of this incident, we will be moving to a high availability cache configuration and adding resilience to cache failures at this layer to ensure requests are able to be handled should similar circumstances happen in the future.
We will be failing over one of our primary caching hosts to complete our mitigation of the problem. Users will experience some temporary service disruptions until that event is complete.
We are seeing recovery in our caching infrastructure. We are continuing to monitor
Users may experience timeouts in various GitHub services. We have identified an issue with our caching infrastructure and are working to mitigate the issue
We are investigating reports of degraded availability for Issues and Pull Requests
Disruption with some GitHub services
6 updates
On 29 January 2025 between 14:00 UTC and 16:28 UTC Copilot chat in github.com was degraded, where chat messages which included chat skills failed to save to our datastore due to a change in client side generated identifiers.We mitigated the incident by rolling back the client side changes. Based on this incident, we are working on better monitoring to reduce our detection time, fixing gaps in testing to prevent a repeat of incidents such as this one in the future.
We have pushed a fix and are seeing general recovery.
We're continuing to investigate an issue related to Copilot Chat on GitHub.com
We're continuing to investigate an issue related to Copilot Chat on GitHub.com
We're seeing issues related to Copilot chat on GitHub.com
We are currently investigating this issue.
Disruption with some GitHub services
3 updates
On January 27th, 2025, between 23:32:00 UTC and 23:41:00 UTC the Audit Log Streaming service experienced an approximate 9 minute delay of Audit Log Events. Our systems maintained data continuity and we experienced no data loss. There was no impact to the Audit Log API or the Audit Log user interface. Any configured Audit Log Streaming endpoints received all relevant Audit Log Events (but they were delayed) and normal service was restored after the incident's resolution.
We are currently investigating this issue.
Our Audit Log Streaming service is experiencing degradation but is experiencing no data outage.
Incident With Migration Service
1 update
Between Sunday 20:50 UTC and Monday 15:20 UTC the Migrations service was unable to process migrations. This was due to a invalid infrastructure credential. We mitigated the issue by updating the credential internally.Mechanisms and automation will be implemented to detect and prevent this issue again in the future.
Incident with Actions
13 updates
On January 23, 2025, between 9:49 and 17:00 UTC, the available capacity of large hosted runners was degraded. On average, 26% of jobs requiring large runners had a >5min delay getting a runner assigned. This was caused by the rollback of a configuration change and a latent bug in event processing, which was triggered by the mixed data shape that resulted from the rollback. The processing would reprocess the same events unnecessarily and cause the background job that manages large runner creation and deletion to run out of resources. It would automatically restart and continue processing, but the jobs were not able to keep up with production traffic. We mitigated the impact by using a feature flag to bypass the problematic event processing logic. While these changes had been rolling out in stages over the last few months and had been safely rolled back previously, an unrelated change prevented rollback from causing this problem in earlier stages.We are reviewing and updating the feature flags in this event processing workflow to ensure that we have high confidence in rollback in all rollout stages. We are also improving observability of the event processing to reduce the time to diagnose and mitigate similar issues going forward.
We are seeing recovery with the latest mitigation. Queue time for a very small percentage of larger runner jobs are still longer than expected so we are monitoring those for full recovery before going green.
We are actively applying mitigations to help improve larger runner start times. We are currently seeing delays starting about 25% of larger runner jobs.
We are still actively investigating a slowdown in larger runner assignment and are working to apply additional mitigations.
We're still applying mitigations to unblock queueing Actions in large runners. We are monitoring for full recovery.
We are applying further mitigations to fix the issues with delayed queuing for Actions jobs in large runners. We continue to monitor for full recovery.
We are investigating further mitigations for queueing Actions jobs in large runners. We continue to watch telemetry and are monitoring for full recovery.
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
The team continues to apply mitigations for issues with some Actions jobs delayed being enqueued for larger runners seen by a small number of customers. We will continue providing updates on the progress towards full mitigation.
The team continues to apply mitigations for issues with some Actions jobs delayed being enqueued for larger runners. We will continue providing updates on the progress towards full mitigation.
The team continues to investigate issues with some Actions jobs delayed being enqueued for larger runners. We will continue providing updates on the progress towards mitigation.
The team continues to investigate issues with some Actions jobs having delays in being queued for larger runners. We will continue providing updates on the progress towards mitigation.
We are investigating reports of degraded performance for Actions
Incident with Pull Request Rebase Merges
7 updates
On January 16, 2025, between 00:45 UTC and 09:40 UTC the Pull Requests service was degraded and failed to generate rebase merge commits. This was due to a configuration change that introduced disagreements between replicas. These disagreements caused a secondary job to run, triggering timeouts while computing rebase merge commits. We mitigated the incident by rolling back the configuration change.We are working on improving our monitoring and deployment practices to reduce our time to detection and mitigation of issues like this one in the future.
The incident has been resolved, but please note affected pull requests will self repair when any commits are pushed to the pull requests' base branch or head branch. If you encounter problems with a rebase and merge, either click the "update branch" button or push a commit to the PR's branch.
We have mitigated the incident, and any new pull request rebase merges should be recovered. We are working on recovery steps for any pull requests that attempted to merge during this incident.
We believe to have found a root cause, and in the process of verifying the mitigation.
We are still continuing to investigate.
We are still experiencing failures for rebase merges in pull requests, we are continuing to investigate.
We are investigating reports of degraded performance for Pull Requests
Disruption connecting to Codespaces
4 updates
On January 14, 2025, between 19:13 UTC and 21:210 UTC the Codespaces service was degraded and led to connection failures with running codespaces, with a 7.6% failure rate for connections during the degradation. Users with bad connections could not use impacted codespaces until they were stopped and restarted.This was caused by bad connections left behind after a deployment in an upstream dependency that the Codespaces service still provided to clients. The incident self-mitigated as new connections replaced stale ones. We are coordinating to ensure connection stability with future deployments of this nature.
We are beginning to see recovery for users connecting to Codespaces. Any users continuing to see impact should attempt a restart.
We are investigating reports of degraded performance for Codespaces
We are investigating reports of timeouts for Codespaces users creating new or connecting to existing Codespaces. We will continue to keep users updated on progress towards mitigation.
Incident with Git Operations
5 updates
On January 13, 2025, between 23:35 UTC and 00:24 UTC all Git operations were unavailable due to a configuration change causing our internal load balancer to drop requests between services that Git relies upon.We mitigated the incident by rolling back the configuration change.We are improving our monitoring and deployment practices to reduce our time to detection and automated mitigation for issues like this in the future.
We've identified a cause of degraded git operations, which may affect other GitHub services that rely upon git. We're working to remediate.
Actions is experiencing degraded performance. We are continuing to investigate.
Pages is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded availability for Git Operations
Issues with VNet Injected Larger Hosted Runners in East US 2
7 updates
The impact to Large Runners has been mitigated. The third party incident has not been fully mitigated but is being actively monitored at https://azure.status.microsoft/en-us/status in case of reoccurrence.
On January 9, 2025, larger hosted runners configured with Azure private networking in East US 2 were degraded, causing delayed job starts for ~2,300 jobs between 16:00 and 20:00 UTC. There was also an earlier period of impact from 2025-01-08 22:00 UTC to 2025-01-09 4:10 UTC with 488 jobs impacted. The cause of both these delays was an incident in East US 2 impacting provisioning and network connectivity of Azure resources. More details on that incident are visible at https://azure.status.microsoft/en-us/status/history (Tracking ID: PLP3-1W8). Because these runners are reliant on private networking with networks in the East US 2 region, there were no immediate mitigations available other than restoring network connectivity. Going forward, we will continue evaluating options to provide better resilience to 3rd party regional outages that affect private networking customers.
We are continuing to see improvements while still monitoring updates from the third party at https://azure.status.microsoft/en-us/status
We are still monitoring the third party networking updates via https://azure.status.microsoft/en-us/status. Multiple workstreams are in progress by the third party to mitigate the impact.
We are still monitoring the third party networking updates via https://azure.status.microsoft/en-us/status. Multiple workstreams are in progress by the third party to mitigate the impact.
The underlying third party networking issues have been identified and are being work on. Ongoing updates can be found at https://azure.status.microsoft/en-us/status
We are currently investigating this issue.
Some GitHub Actions may not run
6 updates
Actions is operating normally.
On January 9, 2025, between 06:26 and 07:49 UTC, Actions experienced degraded performance, leading to failures in about 1% of workflow runs across ~10k repositories. The failures occurred due to an outage in a dependent service, which disrupted Redis connectivity in the East US 2 region. We mitigated the incident by re-routing Redis traffic out of that region at 07:49 UTC. We continued to monitor service recovery before resolving the incident at 08:30 UTC. We are working to improve our monitoring to reduce our time to detection and mitigation of issues like this one in the future.
We have seen recovery of Actions runs for affected repositories. We are verifying all remediations before resolving this incident.
We have identified the problem and are proceeding with a fail-over remediation. We anticipate this will allow Actions Runs to proceed for affected repositories.
1-2% of repositories may have Actions jobs that are blocked and are not running or will be delayed. We have identified a potential cause. We are confirming and will be working on remediation.
We are investigating reports of degraded performance for Actions
Incident with Webhooks
23 updates
On January 9, 2025, between 01:26 UTC and 01:56 UTC GitHub experienced widespread disruption to many services, with users receiving 500 responses when trying to access various functionality. This was due to a deployment which introduced a query that saturated a primary database server. On average, the error rate was 6% and peaked at 6.85% of update requests.We mitigated the incident by identifying the source of the problematic query and rolling back the deployment.We are investigating methods to detect problematic queries prior to deployment to prevent, and to reduce our time to detection and mitigation of issues like this one in the future.
We have identified the root cause and have deployed a fix. Majority of the services have recovered. Actions service is in the process of being recovered.
Copilot is operating normally.
Pages is operating normally.
Issues is operating normally.
Pull Requests is operating normally.
Webhooks is operating normally.
Git Operations is operating normally.
Codespaces is operating normally.
We have identified the root cause and have deployed a fix. Service are recovering.
API Requests is experiencing degraded performance. We are continuing to investigate.
We are continuing the investigation of multiple service issues. We will continue to keep users updated on progress towards mitigation.
Copilot is experiencing degraded performance. We are continuing to investigate.
Codespaces is experiencing degraded availability. We are continuing to investigate.
Codespaces is experiencing degraded performance. We are continuing to investigate.
Git Operations is experiencing degraded availability. We are continuing to investigate.
We are investigating reports of issues with multiple services including authentication, PRs, codespaces, pages, git operation, and apis. We will continue to keep users updated on progress towards mitigation.
Pages is experiencing degraded performance. We are continuing to investigate.
Git Operations is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Issues is experiencing degraded performance. We are continuing to investigate.
Actions is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded availability for Webhooks
Incident with Actions resulting in degraded performance
9 updates
Issues is operating normally.
On January 7th, 2025 between 11:54:00 and 16:39 UTC, degraded performance was observed in Actions, Webhooks, and Issues, caused by an internal Certificate Authority configuration change that disrupted our event infrastructure. The configuration issue was promptly identified and resolved by rolling the change back on impacted hosts and re-issuing certificates.We have identified what services need updates to support the current PKI architecture and are working on implementing those changes to prevent a future recurrence.
Webhooks is operating normally.
Actions is operating normally.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We have identified a configuration issue that we believe is the source of the Action workflow job delays and page latency increases. We are continuing to investigate and mitigate the issue.
Issues is experiencing degraded performance. We are continuing to investigate.
Users may see delays with Action workflow jobs in the UI and API responses. Additionally, several endpoints, including some Pull Request pages are experiencing increased latency. We are continuing to investigate the issue.
We are investigating reports of degraded performance for Actions
Incident with Actions
6 updates
All systems are operational, and we have a plan to backfill the missing metadata. In total, 139,000 PRs were impacted across 45,000 repositories. The backfilled metadata will be available in a few days.Until the backfill is complete, there are several actions you can take to ensure an Action runs:- Any Actions that should have run on closed but not merged PRs can be triggered by re-opening and re-closing the PR.- Actions that should have run on PR merge can be re-run from the main branch of your repository.The only Actions that cannot be re-run at this time are ones that specifically use the merge commit.Additionally, the `merge_commit_sha` field on an impacted Pull Request will be `null` when queried via our API until the backfill completes.We appreciate the error reports we received, and thank you for your patience. We mitigated the initial impact quickly by rolling back a feature flag. We will be improving the monitoring of our feature flag rollouts in the future to better catch these scenarios.
On January 2, 2025 between 16:00:00 and 22:27:30 UTC, a bug in feature-flagged code that cleans up Pull Requests after they are closed or merged incorrectly cleared the merge commit SHA for ~139,000 pull requests. During the incident, Actions workflows triggered by the on: pull_request trigger for the closed type were not queued successfully because of these missing merge commit SHAs. Approximately 45,000 repositories experienced these missing workflow triggers in either of two possible scenarios: pull requests which were closed, but not merged; and pull requests which were merged. Impact was mitigated after rolling back the aforementioned feature flag. Merged pull requests that were affected have had their merge commit SHAs restored. Closed pull requests have not had their merge commit SHA restored; however, customers can re-open and close them again to recalculate this SHA. We are investigating methods to improve detection of these kinds of errors in the future.
We have remediated the issue impacting Actions workflows. During investigation and remediation, we realized there were also issues with recording metadata around merge commits. No git data or code has been lost. PRs merged today between 20:06 UTC and 22:15 UTC are impacted. We are working on a plan to regenerate the missing metadata and will provide an update once we have one in place.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
We have identified and begun to remediate the issue preventing Actions from triggering on closed pull requests. We are beginning to see recovery.
We are investigating reports of degraded performance for Actions