GitHub SLA Tracker

2025-Q3

Jul 1, 2025 - Sep 30, 2025

SLA Violation

Total Downtime

1d 5h
Weighted by impact

Total Incidents

44
In this quarter (19 tracked)

Worst Component

Issues
99.742% uptime

Service Features

Time-based uptime calculation for the 132,480 minutes in this quarter

Calculation Method: (Total minutes - Downtime) / Total minutes × 100
Downtime Definition: Minutes with >5% error rate (approximated from incident data)
Component Uptime % Downtime Incidents Status Service Credit
Git Operations 99.8564% 3h 10m 5 Violation 10%
API Requests 99.7551% 5h 25m 5 Violation 10%
Issues 99.7424% 5h 41m 7 Violation 10%
Pull Requests 99.7660% 5h 10m 5 Violation 10%
Webhooks 99.9530% 1h 2m 2 Pass None
Pages 99.9974% 3m 1 Pass None

Incidents in 2025-Q3

44 incidents occurred during this quarter

Filter by component:
Created:
Resolved:
Duration: 33m
Weighted Downtime: 8.25m
4 updates
resolved

On September 29, 2025, between 17:53 and 18:42 UTC, the Copilot service experienced a degradation of the Gemini 2.5 model due to an issue with our upstream provider. Approximately 24% of requests failed, affecting 56% of users during this period. No other models were impacted.<br /><br />GitHub notified the upstream provider of the problem as soon as it was detected. The issue was resolved after the upstream provider rolled back a recent change that caused the disruption. GitHub will continue to enhance our monitoring and alerting systems to reduce the time it takes to detect and mitigate similar issues in the future.

investigating

The upstream model provided has resolved the issue and we are seeing full availability for Gemini 2.5 Pro and Gemini 2.0 Flash.

investigating

We are experiencing degraded availability for the Gemini 2.5 Pro & Gemini 2.0 Flash models in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.<br /><br />Other models are available and working as expected.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 49m
Weighted Downtime: 36.75m
3 updates
resolved

On September 29, 2025 between 16:26 UTC and 17:33 UTC the Copilot API experienced a partial degradation causing intermittent erroneous 404 responses for an average of 0.2% of GitHub MCP server requests, peaking at times around 2% of requests. The issue stemmed from an upgrade of an internal dependency which exposed a misconfiguration in the service.<br /><br />We resolved the incident by rolling back the upgrade to address the misconfiguration. We fixed the configuration issue and will improve documentation and rollout process to prevent similar issues.

investigating

Customers are getting 404 responses when connecting to the GitHub MCP server. We have reverted a change we believe is contributing to the impact, and are seeing resolution in deployed environments.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 36m
Weighted Downtime: 9m
3 updates
resolved

On September 26, 2025 between 16:22 UTC and 18:32 UTC raw file access was degraded for a small set of four repositories. On average, raw file access error rate was 0.01% and peaked at 0.16% of requests. This was due to a caching bug exposed by excessive traffic to a handful of repositories. <br /><br />We mitigated the incident by resetting the state of the cache for raw file access and are working to improve cache usage and testing to prevent issues like this in the future.<br />

investigating

We are seeing issues related to our ability to serve raw file access across a small percentage of our requests.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 50m
Weighted Downtime: 37.5m
3 updates
resolved

On September 23, 2025, between 15:29 UTC and 17:38 UTC and also on September 24, 2025 between 15:02 UTC and 15:12, email deliveries were delayed up to 50 minutes which resulted in significant delays for most types of email notifications. This occurred due to an unusually high volume of traffic which caused resource contention on some of our outbound email servers.<br /><br />We have updated the configuration we use to better allocate capacity when there is a high volume of traffic and are also updating our monitors so we can detect this type of issue before it becomes a customer impacting incident.

investigating

We are seeing delays in email delivery, which is impacting notifications and user signup email verification. We are investigating and working on mitigation.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 0m
Weighted Downtime: 0m
3 updates
resolved

On September 24th, 2025, between 08:02 UTC and 09:11 UTC the Copilot service was degraded for Claude Opus 4 and Claude Opus 4.1 requests. On average, 22% of requests failed for Claude Opus 4 and 80% of requests for Claude Opus 4.1. This was due to an upstream provider returning elevated errors on Claude Opus 4 and Opus 4.1. We mitigated the issue by directing users to select other models and by monitoring recovery. To resolve the issue, we are expanding failover capabilities by integrating with additional infrastructure providers.

investigating

Between around 8:16 UTC and 8:51 UTC we saw elevated errors on Claude Opus 4 and Opus 4.1, up to 49% of requests were failing. This has recovered to around 4% of requests failing, we are monitoring recovery.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 2h 4m
Weighted Downtime: 31m
Affected Components: Copilot
4 updates
resolved

Between 20:06 UTC September 23 and 04:58 UTC September 24, 2025, the Copilot service experienced degraded availability for Claude Sonnet 4 and 3.7 model requests.<br /><br />During this period, 0.46% of Claude 4 requests and 7.83% of Claude 3.7 requests failed.<br /><br />The reduced availability resulted from Copilot disabling routing to an upstream provider that was experiencing issues and reallocating capacity to other providers to manage requests for Claude Sonnet 3.7 and 4.<br />We are continuing to investigate the source of the issues with this provider and will provide an update as more information becomes available.

investigating

The issues with our upstream model provider have been resolved, and Claude Sonnet 3.7 and Claude Sonnet 4 are once again available in Copilot Chat, VS Code and other Copilot products.<br /><br />We will continue monitoring to ensure stability, but mitigation is complete.<br />

investigating

We are experiencing degraded availability for the Claude Sonnet 3.7 and Claude Sonnet 4 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.<br /><br />Other models are available and working as expected.

investigating

We are investigating reports of degraded performance for Copilot

Created:
Resolved:
Duration: 14m
Weighted Downtime: 3.5m
Affected Components: ActionsPages
3 updates
resolved

On September 23, between 17:11 and 17:40 UTC, customers experienced failures and delays when running workflows on GitHub Actions and building or deploying GitHub Pages. The issue was caused by a faulty configuration change that disrupted service to service communication in GitHub Actions. During this period, in-progress jobs were delayed and new jobs would not start due to a failure to acquire runners, and about 30% of all jobs failed. GitHub Pages users were unable to build or deploy their Pages during this period.<br /><br />The offending change was rolled back within 15 minutes of its deployment, after which Actions workflows and Pages deployments began to succeed. Actions customers continued to experience delays for about 15 minutes after the rollback was completed while services worked through the backlog of queued jobs. We are planning to implement additional rollout checks to help detect and prevent similar issues in the future.

investigating

We are investigating delays in Actions Workflows.

investigating

We are investigating reports of degraded performance for Actions and Pages

Created:
Resolved:
Duration: 0m
Weighted Downtime: 0m
3 updates
resolved

On September 23, 2025, between 15:29 UTC and 17:38 UTC and also on September 24, 2025 between 14:02 UTC and 15:12 UTC, email deliveries were delayed up to 50 minutes which resulted in significant delays for most types of email notifications. This occurred due to an unusually high volume of traffic which caused resource contention on some of our outbound email servers.<br /><br />We have updated the configuration we use to better allocate capacity when there is a high volume of traffic and are also updating our monitors so we can detect this type of issue before it becomes a customer impacting incident.<br />

investigating

We're seeing delays related to outbound emails and are investigating.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 2h 51m
Weighted Downtime: 42.75m
Affected Components: Codespaces
6 updates
resolved

On September 17, 2025 between 13:23 and 16:51 UTC some users in West Europe experienced issues with Codespaces that had shut down due to network disconnections and subsequently failed to restart. Codespace creations and resumes were failed over to another region at 15:01 UTC. While many of the impacted instances self-recovered after mitigation efforts, approximately 2,000 codespaces remained stuck in a "shutting down" state while the team evaluated possible methods to recover unpushed data from the latest active session of affected codespaces. Unfortunately, recovery of that data was not possible. We unblocked shutdown of those codespaces, with all instances either shut down or available by 8:26 UTC on September 19.<br /><br />The disconnects were triggered by an exhaustion of resources in the network relay infrastructure in that region, but the lack of self-recovery was caused by an unhandled error impacting the local agent, which led to an unclean shutdown.<br /><br />We are improving the resilience of the local agent to disconnect events to ensure shutdown of codespaces is always clean without data loss. We have also addressed the exhausted resources in the network relay and will be investing in improved detection and resilience to reduce the impact of similar events in the future.

investigating

We have confirmed the original mitigation to failover has resolved the issue causing Codespaces to become unavailable. We are evaluating if there is a path to recover unpushed data from the approximately 2000 Codespaces that are currently in the shutting down state. We will be resolving this incident and will detail the next steps in our public summary.

investigating

For Codespaces that were stuck in the shutting down state and have been resumed, we've identified an issue that is causing the contents Codespace to be irrecoverably lost which has impacted approximately 250 Codespaces. We are actively working on a mitigation to prevent any more Codespaces currently in this state from being forced to shut down to prevent the potential data loss.

investigating

We're continuing to see improvement with Codespaces that were stuck in in the shutting down state and we anticipate the remaining should self resolve in about an hour.

investigating

Some users with Codespaces in West Europe were unable to connect to Codespaces, we have failed over that region and users should be able to create new Codespaces. If a user has a Codespace in a shutting down state, we are still investigating potential fixes and mitigations.

investigating

We are investigating reports of degraded performance for Codespaces

Created:
Resolved:
Duration: 35m
Weighted Downtime: 8.75m
Affected Components: Git Operations
5 updates
resolved

Between 16:26 UTC on September 15th and 18:30 UTC on September 16th, anonymous REST API calls to approximately 20 endpoints were incorrectly rejected because they were not authenticated. While this caused unauthenticated requests to be rejected by these endpoints, all authenticated requests were unaffected, and no protected endpoints were exposed.<br /><br />This resulted in 100% of requests to these endpoints failing at peak, representing less than 0.1% of GitHub’s overall request volume. On average, the error rate for these endpoints was less than 50% and peaked at 100% for about 26 hours over September 16th. API requests to the impacted endpoints were rejected with a 401 error code. This was due to a mismatch in authentication policies, for specific endpoints, during a system migration.<br /><br />The failure to detect the errors was the result of the issue occurring for a low percentage of traffic.<br /><br />We mitigated the incident by reverting the policy in question, and correcting the logic associated with the degraded endpoints. We are working to improve our test suite to further validate mismatches, and refining our monitors for proactive detection.

investigating

We have mitigated the issue and are monitoring the results

investigating

Git Operations is experiencing degraded performance. We are continuing to investigate.

investigating

A recent change to our API routing inadvertently added an authentication requirement to the anonymous route for LFS requests. We're in the process of fixing the change, but in the interim retrying should eventually succeed.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 31m
Weighted Downtime: 7.75m
4 updates
resolved

Between 16:26 UTC on September 15th and 18:30 UTC on September 16th, anonymous REST API calls to approximately 20 endpoints were incorrectly rejected because they were not authenticated. While this caused unauthenticated requests to be rejected by these endpoints, all authenticated requests were unaffected, and no protected endpoints were exposed.<br /><br />This resulted in 100% of requests to these endpoints failing at peak, representing less than 0.1% of GitHub’s overall request volume. On average, the error rate for these endpoints was less than 50% and peaked at 100% for about 26 hours over September 16th. API requests to the impacted endpoints were rejected with a 401 error code. This was due to a mismatch in authentication policies, for specific endpoints, during a system migration.<br /><br />The failure to detect the errors was the result of the issue occurring for a low percentage of traffic.<br /><br />We mitigated the incident by reverting the policy in question, and correcting the logic associated with the degraded endpoints. We are working to improve our test suite to further validate mismatches, and refining our monitors for proactive detection.

investigating

We have mitigated the issue and are monitoring the results

investigating

A recent change to our API routing inadvertently added an authentication requirement to the anonymous route for creating GitHub apps. We're in the process of fixing the change, but in the interim retrying should eventually succeed.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 7m
Weighted Downtime: 5.25m
2 updates
resolved

On September 15th between 17:55 and 18:20 UTC, Copilot experienced degraded availability for all features. This was due a partial deployment of a feature flag to a global rate limiter. The flag triggered behavior that unintentionally rate limited all requests, resulting in 100% of them returning 403 errors. The issue was resolved by reverting the feature flag which resulted in immediate recovery.<br /><br />The root cause of the incident was from an undetected edge case in our rate limiting logic. The flag was meant to scale down rate limiting for a subset of users, but unintentionally put our rate limiting configuration into an invalid state.<br /><br />To prevent this from happening again, we have addressed the bug with our rate limiting. We are also adding additional monitors to detect anomalies in our traffic patterns, which will allow us to identify similar issues during future deployments. Furthermore, we are exploring ways to test our rate limit scaling in our internal environment to enhance our pre-production validation process.<br />

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 2d 8h
Weighted Downtime: 14h 4.25m
4 updates
resolved

At around 18:45 UTC on Friday, September 12, 2025, a change was deployed that unintentionally affected search index management. As a result, approximately 25% of repositories were temporarily missing from search results.<br /><br />By 12:45 UTC on Saturday, September 14, most missing repositories were restored from an earlier search index snapshot, and repositories updated between the snapshot and the restoration were reindexed. This backfill was completed at 21:25 UTC.<br /><br />After these repairs, about 98.5% of repositories were once again searchable. We are performing a full reconciliation of the search index and customers can expect to see records being updated and content becoming searchable for all repos again between now and Sept 25.<br /><br />NOTE: Users who notice missing or outdated repositories in search results can force reindexing by starring or un-starring the repository. Other repository actions such as adding topics, or updating the repository description, will also result in reindexing. In general, changes to searchable artifacts in GitHub will also update their respective search index in near-real time.<br /><br />User impact has been mitigated with the exception of the 1.5% of repos that are missing from the search index. The change responsible for the search issue has been reverted, and full reconciliation of the search index is underway, expected to complete by September 23. We have added additional checks to our indexing model to ensure this failure does not happen again. We are also investigating faster repair alternatives.<br /><br />To avoid resource contention and possible further issues we are currently not repairing repositories or organizations individually at this time. No repository data was lost, and other search types were not affected.

investigating

Most searchable repositories should again be visible in search results. Up to 1.5% of repositories may still be missing from search results.<br /><br />Many different actions synchronize the repository state with the search index, so we expect natural recovery for repositories that see more frequent user and API-driven interactions. <br /><br />A complete index reconciliation is underway to restore stagnant repositories that were deleted from the index. We will update again once we have a clear timeline of when we expect full recovery for those missing search results.

investigating

Customers are not seeing repositories they expect to see in search results. We have restored a snapshot of this search index from Fri 12 Sep at 21:00 UTC. Changes made since then will be unavailable while we work to backfill the rest of the search index. Any new changes will be available in near-real time as expected.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 39m
Weighted Downtime: 9.75m
Affected Components: Actions
3 updates
resolved

On September 10, 2025 between 13:00 and 14:15 UTC, Actions users experienced failed jobs and run start delays for Ubuntu 24 and Ubuntu 22 jobs on standard runners in private repositories. Additionally, larger runner customers experienced run start delays for runner groups with private networking configured in the eastus2 region. This was due to an outage in an underlying compute service provider in eastus2. 1.06% of Ubuntu 24 jobs and 0.16% of Ubuntu 22 jobs failed during this period. Jobs for larger runners using private networking in the eastus2 region were unable to start for the duration of the incident.We have identified and are working on improvements in our resilience to single partner region outages for standard runners so impact is reduced in similar scenarios in the future.

update

Actions hosted runners are taking longer to come online, leading to high wait times or job failures.

investigating

We are investigating reports of degraded performance for Actions

Created:
Resolved:
Duration: 2h 9m
Weighted Downtime: 32.25m
Affected Components: API Requests
7 updates
resolved

On September 4, 2025 between 15:30 UTC and 20:00 UTC the REST API endpoints git/refs, git/refs/*, and git/matching-refs/* were degraded and returned elevated errors for repositories with reference counts over 22k. On average, the request error rate to these specific endpoints was 0.5%. Overall REST API availability remained 99.9999%. This was due to the introduction of a code change that added latency to reference evaluations and overly affected repositories with many branches, tags, or other references.We mitigated the incident by reverting the new code.We are working to improve performance testing and to reduce our time to detection and mitigation of issues like this one in the future.

update

The deployment has completed and we expect customers who have been impacted to see recovery. We are continuing to monitor.

update

We are in the process of deploying the PR to revert the change that was causing timeouts to this endpoint. We will update again once that deployment is complete.

update

We have identified a deployed change that correlates with the increase in 5XX errors to the GitRefs REST API. This is particularly affecting requests for repos with very large numbers of commits. We are working on rolling back this change which we expect will resolve the issue.

update

API Requests is experiencing degraded performance. We are continuing to investigate.

update

Customers are experiencing 504 responses for some API requests for regarding repo refs/tags. We are investigating.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 27m
Weighted Downtime: 6.75m
2 updates
resolved

Between August 21, 2025 at 15:00 UTC, and September 2, 2025 at 15:22 UTC the avatars.githubusercontent.com image service was degraded and failed to display user avatars for users in the Middle East. During this time, avatar images appeared broken on github.com for affected users. On average, this impacted about 82% of users routed through one of our Middle East-based points-of-presence, which represents about 0.14% of global users.This was due to a configuration change within GitHub's edge infrastructure in the affected region, causing HTTP requests to fail. As a result, image requests returned HTTP 503 errors. The failure to detect the issues was the result of an alerting threshold set too low.We mitigated the incident by removing the affected site from service, which restored avatar serving for impacted users.To prevent this from recurring, we have tuned configuration defaults for graceful degradation. We also added new health checks to automatically shift traffic from impacted sites. We are updating our monitoring to prevent undetected errors like this in the future.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 46m
Weighted Downtime: 34.5m
Affected Components: API Requests and Issues
9 updates
update

API Requests and Issues are operating normally.

resolved

On August 27, 2025 between 20:35 and 21:17 UTC, Copilot, Web and REST API traffic experienced degraded performance. Copilot saw an average of 36% of requests fail with a peak failure rate of 77%. Approximately 2% of all non-Copilot Web and REST API traffic requests failed.This incident occurred after we initiated a production database migration to drop a column from a table backing copilot functionality. While the column was no longer in direct use, our ORM continued to reference the dropped column. This led to a large number of 5xx responses and was similar to the incident on August 5th. At 21:15 UTC, we applied a fix to the production schema and by 21:17 UTC, all services had fully recovered.While repairs were in progress to avoid this situation, they were not completed quickly enough to prevent a second incident. We have now implemented a temporary block for all drop column operations as an immediate solution while we add more safeguards to prevent similar issues from occurring in the future. We are also implementing graceful degradation so that Copilot issues will not impact other features of our product.

update

We've discovered the cause of the service disruption and applied a mitigation.

update

We are continuing to investigate this issue.

update

API Requests is experiencing degraded performance. We are continuing to investigate.

update

The team is aware of the root cause of this issue and is working to mitigate the issue quickly.

update

Issues is experiencing degraded performance. We are continuing to investigate.

update

API Requests is experiencing degraded availability. We are continuing to investigate.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 2h 19m
Weighted Downtime: 34.75m
Affected Components: Actions
6 updates
resolved

On August 21, 2025, from approximately 15:37 UTC to 18:10 UTC, customers experienced increased delays and failures when starting jobs on GitHub Actions using standard hosted runners. This was caused by connectivity issues in our East US region, which prevented runners from retrieving jobs and sending progress updates. As a result, capacity was significantly reduced, especially for busier configurations, leading to queuing and service interruptions. Approximately 8.05% of jobs on public standard Ubuntu24 runners and 3.4% of jobs on private standard Ubuntu24 runners did not start as expected.By 18:10 UTC, we had mitigated the issue by provisioning additional resources in the affected region and burning down the backlog of queued runner assignments. By the end of that day, we deployed changes to improve runner connectivity resilience and graceful degradation in similar situations. We are also taking further steps to improve system resiliency by enhancing observability of network connection health with runners and improving load distribution and failover handling to help prevent similar issues in the future.

update

We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.

update

The team continues to investigate issues with some Actions jobs on Hosted Runners being queued for a long time and a percentage of jobs failing. We are increasing runner capacity and will continue providing updates on the progress towards mitigation.

update

The team continues to investigate issues with some Actions jobs on Hosted Runners being queued for a long time and a percentage of jobs failing. We will continue providing updates on the progress towards mitigation.

update

We are investigating reports of slow queue times for Hosted Runners, leading to high wait times.

investigating

We are investigating reports of degraded performance for Actions

Created:
Resolved:
Duration: 33m
Weighted Downtime: 8.25m
Affected Components: Git Operations and Issues
7 updates
update

The errors in our database infrastructure were related to some maintenance events that had more impact than expected. We will provide more details and follow ups when we post a public summary for this incident in the coming days. All impact to customers is resolved.

update

Issues is operating normally.

update

Git Operations is operating normally.

resolved

On August 21st, 2025, between 6:15am UTC and 6:25am UTC Git and Web operations were degraded and saw intermittent errors. On average, the error rate was 1% for API and Web requests. This was due to database infrastructure automated maintenance reducing capacity below our tolerated threshold.The incident was resolved when the impacted infrastructure self-healed and returned to normal operating capacity.We are adding guardrails to reduce the impact of this type of maintenance in the future.

update

We saw a brief spike in failures related to some of our database infrastructure. Everything has recovered but we are continuing to investigate to ensure we don't see any reoccurrence.

update

Approximately 1% of API and web requests are seeing intermittent errors. Some customers may see some push errors. We are currently investigating.

investigating

We are investigating reports of degraded performance for Git Operations and Issues

Created:
Resolved:
Duration: 23m
Weighted Downtime: 5.75m
4 updates
update

We have verified that we fixed the sign up flow and are working to ensure we don't introduce an issue like this in the future.

resolved

Between 15:49 and 16:37 UTC on 20 Aug 2025, creating a new GitHub account via the web signup page consistently returned server errors, and users were unable to complete signup during this 48-minute window. We detected the issue at 16:04 UTC and restored normal signup functionality by 16:37 UTC. A recent change to signup flow logic caused all attempts to error. The change was rolled back to restore service. This exposed a gap in our test coverage that we are fixing.

update

Customers may experience issues when signing up for new GitHub accounts. We are actively working on a mitigation and will post an update within 30 minutes.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 1h 7m
Weighted Downtime: 16.75m
Affected Components: Issues and Actions
9 updates
update

Issues is operating normally.

update

Actions is operating normally.

resolved

On August 19, 2025, between 13:35 UTC and 14:33 UTC, GitHub search was in a degraded state. When searching for pull requests, issues, and workflow runs, users would have seen some slow, empty or incomplete results. In some cases, pull requests failed to load.The incident was triggered by intermittent connectivity issues between our load balancers and search hosts. While retry logic initially masked these problems, retry queues eventually overwhelmed the load balancers, causing failure. The incident was mitigated at 14:33 UTC by throttling our search index pipeline. Our automated alerting and internal retries reduced the impact of this event significantly. As a result of this incident we believe we have identified a faster way to mitigate it in the future. We are also working on multiple solutions to resolve the underlying connectivity issues.

update

We were able to mitigate the slowness by throttling some search indexing and will work on the issues created by the increased search indexing so they do not have latency impact.

update

We are seeing slightly elevated latency on some Issues endpoints and searches for workflow runs in Actions may not return quickly.

update

Actions is experiencing degraded performance. We are continuing to investigate.

update

Issues is experiencing degraded performance. We are continuing to investigate.

investigating

We are currently investigating this issue.

update

Issues with timeouts when searching

Created:
Resolved:
Duration: 31m
Weighted Downtime: 7.75m
Affected Components: Packages
4 updates
update

The NPM registry has now returned to normal functioning.

resolved

On August 14, 2025, between 17:50 UTC and 18:08 UTC, the Packages NPM Registry service was degraded. During this period, NPM package uploads were unavailable and approximately 50% of download requests failed. We identified the root cause as a sudden spike in Packages publishing activity that exceeded our service capacity limits. We are implementing better guardrails to protect the service against unexpected traffic surges and improving our incident response runbooks to ensure faster mitigation of similar issues.

update

The NPM registry service is currently experiencing intermittent availability issues. Other package registries should be unaffected. Investigations are ongoing.

investigating

We are investigating reports of degraded performance for Packages

Created:
Resolved:
Duration: 1h 20m
Weighted Downtime: 20m
Affected Components: Actions
3 updates
resolved

On August 14, 2025, between 02:30 UTC and 06:14 UTC, GitHub Actions was degraded. On average, 3% of workflow runs were delayed by at least 5 minutes. The incident was caused by an outage in a downstream dependency that led to failures in backend service connectivity in one region. At 03:59 UTC, we evacuated a majority of services in the impacted region, but some users may have seen ongoing impact until all services were fully evacuated at 06:14 UTC. We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.

update

We are investigating reports of issues with service(s): Actions. We will continue to keep users updated on progress towards mitigation.

investigating

We are investigating reports of degraded performance for Actions

Created:
Resolved:
Duration: 3h 44m
Weighted Downtime: 2h 48m
Affected Components: API RequestsIssuesPull RequestsActionsand Packages
8 updates
resolved

On August 12, 2025, between 13:30 UTC and 17:14 UTC, GitHub search was in a degraded state. Users experienced inaccurate or incomplete results, failures to load certain pages (like Issues, Pull Requests, Projects, and Deployments), and broken components like Actions workflow and label filters.Most user impact occurred between 14:00 UTC and 15:30 UTC, when up to 75% of search queries failed, and updates to search results were delayed by up to 100 minutes. The incident was triggered by intermittent connectivity issues between our load balancers and search hosts. While retry logic initially masked these problems, retry queues eventually overwhelmed the load balancers, causing failure. The query failures were mitigated at 15:30 UTC after throttling our search indexing pipeline to reduce load and stabilize retries. The connectivity failures were resolved at 17:14 UTC after the automated reboot of a search host, causing the rest of the system to recover. We have improved internal monitors and playbooks, and tuned our search cluster load balancer to further mitigate the recurrence of this failure mode. We continue to invest in resolving the underlying connectivity issues.

update

Service availability has been mostly restored, but some users will continue to see increased request latency and stale search results. We are still working towards full recovery.

update

Service availability has been mostly restored, but increased load/query latency and stale search results persist. We continue to work towards full mitigation.

update

We are seeing partial recovery in service availability, but still see inconsistent experiences and stale search data across services. Investigation and mitigations are underway.

update

We are experiencing increased latency in our API layers and inconsistently degraded experiences when loading or querying issues, pull requests, labels, packages, releases, workflow runs, projects, and repositories, among others. Investigation is underway.

update

We are investigating reports of degraded performance in services backed by search. The team continues to investigate why requests are failing to reach our search clusters.

update

Packages is experiencing degraded performance. We are continuing to investigate.

investigating

We are investigating reports of degraded performance for API Requests, Actions, Issues and Pull Requests

Created:
Resolved:
Duration: 6m
Weighted Downtime: 1.5m
3 updates
resolved

On August 11, 2025, from 18:41 to 18:57 UTC, GitHub customers experienced errors and increased latency when loading GitHub’s web interface. During this time, a configuration change to improve our UI deployment system caused a surge in requests to a backend datastore. This change led to an unexpected spike in connection attempts to our datastore and saturated its connection backlog and resulted in intermittent failures to serve required UI content. This resulted in elevated error rates for frontend requests.The incident was mitigated by reverting the configuration, which restored normal service.Following mitigation, we are evaluating improvements to our alerting thresholds and exploring architectural changes to reduce load to this datastore and improve the resilience of our UI delivery pipeline.

update

Logged out users may see intermittent errors when loading github.com webpages. Investigation is ongoing.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 1h 53m
Weighted Downtime: 28.25m
Affected Components: Pull Requests
7 updates
update

Pull Requests is operating normally.

resolved

At 15:33 UTC on August 5, 2025, we initiated a production database migration to drop a column from a table backing pull request functionality. While the column was no longer in direct use, our ORM continued to reference the dropped column in a subset of pull request queries. As a result, there were elevated error rates across pushes, webhooks, notifications, and pull requests with impact peaking at approximately 4% of all web and REST API traffic. We mitigated the issue by deploying a change that instructed the ORM to ignore the removed column. Most affected services recovered by 16:13 UTC. However, that fix was applied only to our largest production environment. An update to some of our custom and canary environments did not pick up the fix and this triggered a secondary incident affecting ~0.1% of pull request traffic, which was fully resolved by 19:45 UTC.While migrations have protections such as progressive roll-out first targeting validation environments and acknowledge gates, this incident identified an application monitoring gap that would have prevented continued rollout when impact was observed. We will add additional automation and safeguards to prevent future incidents without requiring human intervention. We are also already working on a way to streamline some types of changes across environments, which would have prevented the second incident from occurring.

update

We continue to investigate issues with PRs. Impact remains limited to less than 2% of users.

update

We continue to investigate issues with PRs impacting less than 2% of customers.

update

We continue to investigate issues with PRs impacting less than 2% of customers.

update

We're seeing issues related to PR are investigating. Less than 2% of users are impacted.

investigating

We are investigating reports of degraded performance for Pull Requests

Created:
Resolved:
Duration: 32m
Weighted Downtime: 24m
Affected Components: Git OperationsWebhooksIssuesPull Requestsand Actions
16 updates
update

Webhooks is operating normally.

update

Issues is operating normally.

update

Pull Requests is operating normally.

update

Actions is operating normally.

resolved

At 15:33 UTC on August 5, 2025, we initiated a production database migration to drop a column from a table backing pull request functionality. While the column was no longer in direct use, our ORM continued to reference the dropped column in a subset of pull request queries. As a result, there were elevated error rates across pushes, webhooks, notifications, and pull requests with impact peaking at approximately 4% of all web and REST API traffic. We mitigated the issue by deploying a change that instructed the ORM to ignore the removed column. Most affected services recovered by 16:13 UTC. However, that fix was applied only to our largest production environment. An update to some of our custom and canary environments did not pick up the fix and this triggered a secondary incident affecting ~0.1% of pull request traffic, which was fully resolved by 19:45 UTC.While migrations have protections such as progressive roll-out first targeting validation environments and acknowledge gates, this incident identified an application monitoring gap that would have prevented continued rollout when impact was observed. We will add additional automation and safeguards to prevent future incidents without requiring human intervention. We are also already working on a way to streamline some types of changes across environments, which would have prevented the second incident from occurring.

update

We have fully mitigated this issue and all services are operating normally.

update

Git Operations is operating normally.

update

Webhooks is experiencing degraded performance. We are continuing to investigate.

update

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update

We have identified a change that was made in the Pull Request area for GitHub. Users may be unable to use certain pull request and issues features and may see some webhooks impacted. We have identified the issue, taken mitigation and are starting to see recovery but will continue to monitor and post updates as we have them.

update

Webhooks is experiencing degraded availability. We are continuing to investigate.

update

Git Operations is experiencing degraded performance. We are continuing to investigate.

update

Pull Requests is experiencing degraded availability. We are continuing to investigate.

update

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update

Actions is experiencing degraded performance. We are continuing to investigate.

investigating

We are investigating reports of degraded performance for Issues and Webhooks

Created:
Resolved:
Duration: 1h 35m
Weighted Downtime: 23.75m
4 updates
resolved

Between 06:04 UTC to 10:55 UTC on August 1, 2025, 100% of users attempting to sign up with an email and password experienced errors. Social signup was not affected. Once the problem became clear, the offending code was identified and a change was deployed to resolve the issue. We are adding additional monitoring to our sign-up process to improve our time to detection.

update

We are working on a mitigation to an issue preventing some users from signing up with email and password. Social signup methods remain available.

update

We have identified an issue preventing some new users from signing up. We are working to mitigate.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 1h 24m
Weighted Downtime: 21m
Affected Components: Git Operations and Actions
5 updates
update

Continued monitoring is showing that impact has been mitigated and normal service operation has been restored.We are going to resolve the incident at this time. Thank you for your patience as we investigated this problem.

resolved

Public Summary DraftBetween July 28, 2025 16:31 UTC to July 29, 2025 12:05 UTC users saw degraded Git Operations for raw file downloads. On average, the error rate was .005%, with a peak error rate of 3.89%. This was due to a sustained increase in unauthenticated repository traffic.We mitigated the incident by applying regional rate limiting, rolling back a service that was unable to scale with the additional traffic, and addressed a bug that impacted the caching of raw requests. Additionally, we horizontally scaled several dependencies of the service to appropriately handle the increase in traffic.We are working on improving our time to detection and have implemented controls to prevent similar incidents in future.

update

We identified and removed unhealthy hosts from our system. This has led to a reduction of 429s and a return to normal operating conditions.We are continuing to monitor recovery and will resolve the incident once we are confident the impact has been mitigated.

update

We are seeing an increase in 429s when retrieving git artifacts from GitHub.com. This is manifesting in many ways, for example, in failed GitHub Actions workflow runs.We have our engineers working on mitigation and we will provide more information as we have it. Thank you for your patience.

investigating

We are investigating reports of degraded performance for Actions and Git Operations

Created:
Resolved:
Duration: 3h 26m
Weighted Downtime: 51.5m
Affected Components: API RequestsIssuesand Pull Requests
10 updates
resolved

Between July 28, 2025, 22:23:00 UTC and July 29, 2025 02:06:00 UTC, GitHub experienced degraded performance across multiple services including API, Issues, GraphQL and Pull Requests. During this time, approximately 4% of Web and API requests resulted in 500 errors. This incident was caused by DNS resolution failure while decommissioning infrastructure hosts. We resolved the incident by removing references to the stale hosts.We are working to improve our host replacement process by correcting our automatic host ejection behavior and by ensuring configuration is updated before hosts are decommissioned. This will prevent similar issues in the future.

update

Pull Requests is operating normally.

update

Mitigation has deployed. We are seeing recovery across all impacted services.

update

Issues is operating normally.

update

Team is deploying a mitigation for this incident. We will update again once we have verified the fix.

update

Approximately 4% of requests to impacted services continue to error. The team is continuing its work to mitigate this incident.

update

Team is continuing to look into networking issues. We will keep users updated on progress towards mitigation.

update

Some GitHub services continue to experience degraded performance. Team is looking into networking issues. We will continue to keep users updated on progress towards mitigation.

update

Some GitHub services are experiencing degraded performance. Team is currently investigating to determine a cause and mitigation.

investigating

We are investigating reports of degraded performance for API Requests, Issues and Pull Requests

Created:
Resolved:
Duration: 5h 34m
Weighted Downtime: 4h 10.5m
9 updates
update

This incident is resolved, we will follow up with a detailed root cause analysis as soon as possible.As part of mitigation, some existing IP ranges were replaced. Migrations with customer owned storage that have IP allow lists enabled will require adding new IP ranges to your IP allow lists to prevent migrations from failing.- 20.99.172.64/28- 135.234.59.224/28

resolved

Between approximately 21:41 UTC July 28th and 03:15 UTC July 29th, GitHub Enterprise Importer (GEI) operated in a degraded state where migrations could not be processed. Our investigation found that a component of the GEI infrastructure had been improperly taken out of service and could not be restored to its previous configuration. This necessitated the provisioning of new resources to resolve the incident.As a result, customers will need to add our new IP range to the following IP allow lists, if enabled:- The IP allow list on your destination GitHub.com organization or enterprise- If you're running migrations from GitHub.com, the IP allow list on your source GitHub.com organization or enterprise- If you're running migrations from a GitHub Enterprise Server, Bitbucket Server or Bitbucket Data Center instance, the allow list on your configured Azure Blob Storage or -- Amazon S3 storage account- If you're running migrations from Azure DevOps, the allow list on your Azure DevOps organizationThe new GEI IP ranges for inclusion in applicable IP allow lists are:- 20.99.172.64/28- 135.234.59.224/28 The following IP ranges are no longer used by GEI and can be removed from all applicable IP allow lists:- 40.71.233.224/28- 20.125.12.8/29Users who have run migrations using GitHub Enterprise Importer in the past 90 days will receive email alerts about this change.

update

We have deployed mitigations and are working to verify.

update

The team is continuing its work to mitigate this incident.

update

We're continuing to work to mitigate this issue, customers will continue to see stalled migrations in the meantime.

update

We continue to work to mitigate this issue.

update

We are still working to mitigate the issue.

update

We have identified the issue and we're working to mitigate.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 8h 33m
Weighted Downtime: 2h 8.25m
Affected Components: Git Operations
13 updates
resolved

Between July 28, 2025 16:31 UTC to July 29, 2025 12:05 UTC users saw degraded Git Operations for raw file downloads. On average, the error rate was .005%, with a peak error rate of 3.89%. This was due to a sustained increase in unauthenticated repository traffic.We mitigated the incident by applying regional rate limiting, rolling back a service that was unable to scale with the additional traffic, and addressed a bug that impacted the caching of raw requests. Additionally, we horizontally scaled several dependencies of the service to appropriately handle the increase in traffic.We are working on improving our time to detection and have implemented controls to prevent similar incidents in future.

update

We continue to work to mitigate this issue.

update

We are investigating additional ways to mitigate this issue.

update

We continue to work to mitigate this issue.

update

Some customers continue to experience errors when accessing raw files and archives. We are working on a mitigation to address the issue.

update

We are actively setting up additional rate limiting to address increased requests from scraping and investigating the need to add additional hosts.

update

We are seeing more of https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/ and working to mitigate it.

update

Provisioning of new hosts is underway. We are still investigating other fixes.

update

We are adding additional capacity to our infrastructure to mitigate this issue while still investigating.

update

We are still actively investigating this issue.

update

Git Operations is experiencing degraded performance. We are continuing to investigate.

update

We are investigating errors affecting some archive and raw file downloads. Users may experience rate limit warnings or server errors until this is resolved.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 59m
Weighted Downtime: 14.75m
Affected Components: Actions
4 updates
resolved

On July 23rd, 2025, from approximately 14:30 to 16:30 UTC, GitHub Actions experienced delayed job starts for workflows in private repos using Ubuntu-24 standard hosted runners. This was due to resource provisioning failures in one of our datacenter regions. During this period, approximately 2% of Ubuntu-24 hosted runner jobs on private repos were delayed. Other hosted runners, self-hosted runners, and public repo workflows were unaffected.To mitigate the issue, additional worker capacity was added from a different datacenter region at 15:35 UTC and further increased at 16:00 UTC. By 16:30 UTC, job queues were healthy and service was operating normally. Since the incident, we have deployed changes to improve how regional health is accounted for when allocating new runners, and we are investigating further improvements to our automated capacity scaling logic and manual overrides to prevent a recurrence.

update

We are applying mitigations to increase Actions Hosted Runners capacity, and are starting to see recovery. We’re monitoring to ensure continued stability.

update

We're investigating delays provisioning Actions Hosted Runners. Customers may see delays over 5 minutes for jobs starting.

investigating

We are investigating reports of degraded performance for Actions

Created:
Resolved:
Duration: 14m
Weighted Downtime: 3.5m
Affected Components: Copilot
3 updates
resolved

On July 22nd, 2025, between 17:58 and 18:35 UTC, the Copilot service experienced degraded availability for Claude Sonnet 4 requests. 4.7% of Claude 4 requests failed during this time. No other models were impacted. The issue was caused by an upstream problem affecting our ability to serve requests.We mitigated by rerouting capacity and monitoring recovery. We are improving detection and mitigation to reduce future impact.

investigating

We are investigating reports of degraded performance for Copilot

update

We are experiencing degraded availability for the Claude Sonnet 4 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.Other models are available and working as expected.

Created:
Resolved:
Duration: 14m
Weighted Downtime: 3.5m
2 updates
resolved

On July 21, 2025, between 07:20 UTC and 08:00 UTC, the Copilot service experienced degraded availability for Claude 4 requests. 2% of Claude 4 requests failed during this time. The issue was caused by an upstream problem affecting our ability to serve requests.We mitigated by rerouting capacity and monitoring recovery. We are improving detection and mitigation to reduce future impact.

investigating

We are currently investigating this issue.

Incident with Issues

minor resolved
Created:
Resolved:
Duration: 2h 33m
Weighted Downtime: 38.25m
Affected Components: WebhooksAPI RequestsIssuesPull RequestsPackagesCodespacesand Copilot
17 updates
resolved

On July 21st, 2025, between 07:00 UTC and 09:45 UTC the API, Codespaces, Copilot, Issues, Package Registry, Pull Requests and Webhook services were degraded and experienced dropped requests and increased latency. At the peak of this incident (a 2 minute period around 07:00 UTC) error rates peaked at 11% and went down shortly after. Average web request load times rose to 1 second during this same time frame. After this period, traffic gradually recovered but error rate and latency remained slightly elevated until the end of the incident.This incident was triggered by a kernel bug that caused a crash of some of our load balancers during a scheduled process after a kernel upgrade. In order to mitigate the incident, we halted the roll out of our upgrades, and rolled back the impacted instances. We are working to make sure the kernel version is fully removed from our fleet. As a precaution, we temporarily paused the scheduled process to prevent any unintended use in the affected kernel. We also tuned our alerting so we can more quickly detect and mitigate future instances where we experience a sudden drop in load-balancing capacity.

update

API Requests and Codespaces are operating normally.

update

Copilot is operating normally.

update

Webhooks is operating normally.

update

Mitigations have been applied and we are seeing recovery. We are continuing to closely monitor the situation to ensure complete recovery has been achieved.

update

Issues is operating normally.

update

Packages is operating normally.

update

We are currently implementing mitigations for this issue.

update

Copilot is experiencing degraded performance. We are continuing to investigate.

update

We continue to investigate reports of degraded performance and intermittent timeouts across GitHub.com.

update

Pull Requests is operating normally.

update

We're continuing to investigate reports of degraded performance and intermitiant timeouts across GitHub.com.

update

Pull Requests is experiencing degraded performance. We are continuing to investigate.

update

Packages is experiencing degraded performance. We are continuing to investigate.

update

API Requests is experiencing degraded performance. We are continuing to investigate.

update

Codespaces is experiencing degraded performance. We are continuing to investigate.

investigating

We are investigating reports of degraded performance for Issues and Webhooks

Created:
Resolved:
Duration: 42m
Weighted Downtime: 10.5m
4 updates
resolved

On July 16, 2025, between 05:20 UTC and 08:30 UTC, the Copilot service experienced degraded availability for Claude 3.7 requests. Around 10% of Claude 3.7 requests failed during this time. The issue was caused by an upstream problem affecting our ability to serve requests.We mitigated by rerouting capacity and monitoring recovery. We are improving detection and mitigation to reduce future impact.

update

We have seen recovery on our provider's side but have not yet confirmed if the issue is fully resolved. We will update our status in the next 20 minutes as we know more.

update

We are experiencing degraded availability for the Claude 3.7 Sonnet model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 0m
Weighted Downtime: 0m
1 update
resolved

On 15 July, between 19:55 and 19:58 UTC, requests to GitHub had a high failure rate while successful requests suffered up to 10x expected latency.Browser-based requests saw a failure rate of up to 20%, GraphQL had up to a 9% failure rate and 2% of REST API requests failed. Any downstream service dependent on GitHub APIs was also affected during this short window.The failure stemmed from a database query change, and was rolled back by our deployment tooling which automatically detected the issue. We will continue to invest in automated detection and rollback with a goal of minimizing time to recovery.

Created:
Resolved:
Duration: 39m
Weighted Downtime: 9.75m
Affected Components: Actions
5 updates
resolved

On July 8, 2025, between 14:20 UTC and 16:30 UTC, GitHub Actions service experienced degraded performance leading to delays in updates to Actions workflow runs including missing logs and delayed run status. During this period, 1.07% of Actions workflow runs experienced delayed updates, while 0.34% of runs completed, but showed as canceled in their status. The incident was caused by imbalanced load in our underlying service infrastructure. The issue was mitigated by scaling up our services and tuning resource thresholds. We are working to improve our resilience to load spikes, capacity planning to prevent similar issues, and are implementing more robust monitoring to reduce detection and mitigation time for similar incidents in the future.

update

We are seeing complete recovery for Actions. New jobs will run as normal. Some runs initiated during the incident will be left in a stuck state and will not complete.

update

We have scaled out our capacity and customers will begin to see timely updates.

update

Some customers are seeing delays in updates to their runs resulting in missing logs and delayed run status updates. We are investigating the cause of the issue.

investigating

We are investigating reports of degraded performance for Actions

Created:
Resolved:
Duration: 5m
Weighted Downtime: 1.25m
3 updates
resolved

On July 7, 2025, between 21:14 UTC and 22:34 UTC, Copilot coding Agent was degraded and non-responsive to issue assignment. Impact was limited to internal GitHub staff because the feature flag gating a newly released feature was enabled on internal development setups and not in global GitHub production environments.The incident was mitigated by disabling the feature flag for all users.While our existing safeguards worked as intended—the feature flag allowed for immediate mitigation and the limited scope prevented broader impact—we are enhancing our monitoring to better detect issues that affect smaller user segments and reviewing our internal testing processes to identify similar edge cases before they reach production.

update

We are investigating reports of Copilot Coding Agent service degraded performance

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 19m
Weighted Downtime: 4.75m
3 updates
resolved

On July 7th, 2025, between 18:20 UTC and 22:10 UTC the Actions service was degraded for GitHub Larger Hosted and scale set runners. During this time window, 9% of GitHub Larger Hosted Runners and scale set jobs saw a delay of at least 5 minutes before being assigned to a runner. Impact was more apparent to customers that didn’t have pre-scaled runner pools or who infrequently queued jobs during the incident window. This was due to a change that unintentionally decreased the rate at which we notified our backend that new scale set runners were coming online, and was mitigated by reverting that change. To reduce the likelihood and impact time of a similar issue in the future, we are improving our detection of this failure mode so we catch it in earlier stages of development and rollout.

update

We are investigating reports of degraded performance for larger runners.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 1h 33m
Weighted Downtime: 23.25m
6 updates
resolved

On 7/3/2025, between 3:22 AM and 7:12 AM UTC, customers were prevented from SSO authorizing Personal Access Tokens and SSH keys via the GitHub UI. Approximately 1300 users were impacted.A code change modified the content type of the response returned by the server, causing a lazily-loaded dropdown to fail to render, prohibiting the user from proceeding to authorize. No authorization systems were impacted during the incident, only the UI component. We mitigated the incident by reverting the code change that introduced the problem.We are making improvements to our release process and test coverage to catch this class of error earlier in our deployment pipeline. Further, we are improving monitoring to reduce our time to detection and mitigation of issues like this one in the future.

update

The rollback has been deployed successfully on all environments. Customers should now be able to SSO authorize their Classic Personal Access Tokens and SSH keys on their GitHub organizations.

update

The root cause for the rendering bug that prevented customers from SSO authorizing Personal Access Tokens and SSH keys has started rolling out. We are continuously monitoring this rollback.

update

We have identified the root cause for the rendering bug that prevented customers from SSO authorizing Personal Access Tokens and SSH keys.The changes that caused the issue are being rolled back.

update

We are investigating an issue with SSO authorizing Classic Personal Access Tokens and SSH keys.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 14m
Weighted Downtime: 3.5m
4 updates
update

We're down to healthy level of queued migrations and the system is processing migrations at normal system concurrency levels.

resolved

On July 2, 2025, between 1:35 AM UTC and 16:23 UTC, the GitHub Enterprise Importer (GEI) migration service experienced degraded performance and slower-than-normal migration queue processing times. This incident was triggered due to a migration including an abnormally large number of repositories, overwhelming the queue and slowing processing for all migrations.We mitigated the incident by removing the problematic migrations from the queue. Service was restored to normal operation as the queue volume was reduced.To ensure system stability, we have introduced additional concurrency controls that limit the number of queued repositories per organization migration, helping to prevent similar incidents in the future.

update

Repository migrations are experiencing delayed processing times. Mitigation has been implemented and migration times are recovering.

investigating

We are currently investigating this issue.

Created:
Resolved:
Duration: 22m
Weighted Downtime: 5.5m
4 updates
update

We are no longer experiencing degradation—Claude Sonnet 4 is once again available in Copilot Chat and across IDE integrations.We will continue monitoring to ensure stability, but mitigation is complete.

resolved

On July 2nd, 2025, between approximately 08:40 and 10:16 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Claude Sonnet 4 model, leading to a spike in errors. No other models were impacted.The issue was mitigated by rebalancing load within our infrastructure. GitHub is working to further improve the resiliency of the service to prevent similar incidents in the future.

update

We are experiencing degraded availability for the Claude Sonnet 4 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue. Other models are available and working as expected.

investigating

We are currently investigating this issue.