Shai-Hulud PyPI Packages: How 19 Scientific Projects Were Trojanized
shai-huludpypisocketcybersecuritysoftware supply chainpython securitycredential theftmalwarescientific computingopen source securitybun runtimedeveloper secrets

Shai-Hulud PyPI Packages: How 19 Scientific Projects Were Trojanized

Shai-Hulud: How 19 Scientific PyPI Packages Became a Backdoor for Developer Secrets

The recent Shai-Hulud attack, which compromised 19 scientific Shai-Hulud PyPI packages critical to scientific computing, underscores the persistent challenge of software supply chain security.

When bioinformatics tools like Dynamo, Spateo, and Napari-UFISH are trojanized, it confirms that even specialized, trusted communities are high-value targets for credential theft and workflow compromise.

This analysis delves into the attack mechanics, specific targeting, and broader implications of the Shai-Hulud campaign for open-source Python development and research, beyond the initial discovery of 19 compromised science-focused Shai-Hulud PyPI packages and hundreds of thousands of collective downloads.

The Incident: Details of the Credential Theft Campaign Targeting Shai-Hulud PyPI Packages

Socket, an application security firm, recently identified a new Shai-Hulud supply-chain attack wave. This campaign specifically targeted 19 Shai-Hulud PyPI packages, many prevalent in scientific and bioinformatics domains, including Dynamo, Spateo, CoolBox, U-FISH, and Napari-UFISH. These malicious versions collectively saw hundreds of thousands of downloads.

Attackers aimed to steal credentials to compromise software development workflows and spread further. Socket attributes 37 malicious releases across these 19 packages to a single maintainer, indicating a focused campaign. This incident represents a new wave within the broader 'Shai-Hulud' operation, which Socket has been tracking, noting 453 malicious artifacts across multiple ecosystems. This follows previous campaigns, such as the 'Mini Shai Hulud' attack identified on May 11, 2026, which targeted over 170 npm and Shai-Hulud PyPI packages, demonstrating the actor's evolving tactics and cross-ecosystem reach.

The Mechanism: Python's Startup Hook, Bun, and JavaScript

The Shai-Hulud variant operates through these steps:

  1. Malicious packages embed a *-setup.pth file and an obfuscated JavaScript payload, _index.js. This aligns with T1195.002 (Supply Chain Compromise: Compromise Software Supply Chain).

  2. The .pth file acts as the activation point. Python's site module automatically executes any .pth files found in its site-packages directory upon startup. Consequently, the malicious code executes whenever Python, pip, a test, a notebook kernel, or a CI job starts. This leverages T1546.001 (Event Triggered Execution: .bash_profile and .bashrc) or similar startup execution mechanisms.

  3. Bun's Role: The .pth file does not directly execute JavaScript. Instead, it downloads the Bun JavaScript runtime from GitHub. This method creates an independent execution environment, lessening its dependence on existing system configurations. This constitutes T1105 (Ingress Tool Transfer).

  4. Payload Execution: After Bun is downloaded, the .pth file uses it to execute the bundled _index.js script. This involves T1059.007 (Command and Scripting Interpreter: JavaScript).

Terminal screen showing Python code with malicious activity indicated by a green overlay.
Terminal screen showing Python code with malicious activity
" alt="Terminal screen showing malicious Python code from Shai-Hulud PyPI packages attack">
Terminal showing Python code and malicious activity.

The JavaScript payload acts as a modular credential stealer, designed to target sensitive data such as:

  • GitHub tokens and GitHub Actions secrets
  • Publishing tokens for npm, PyPI, RubyGems, and JFrog
  • Cloud credentials for AWS, GCP, Azure, Kubernetes, and Vault
  • SSH keys and Docker credentials
  • Configuration files like .env, .npmrc, .pypirc
  • Shell histories
  • Claude/MCP configuration files, which can also be used for persistence.

The malware incorporates evasion techniques. It checks for Russian locales/environments and the presence of security tools like StepSecurity Harden-Runner. If detected, the payload remains dormant.

For persistence, the malware employs several mechanisms:

  • It sets up systemd services on Linux. These persistence methods align with T1543.001 (Create or Modify System Process: Systemd Service).
  • It uses LaunchAgents on macOS. This aligns with T1543.004 (Create or Modify System Process: Launch Agent).
  • It injects into GitHub workflow files. This aligns with T1546.001 (Event Triggered Execution: .bash_profile and .bashrc) for GitHub workflow files.
  • It also injects hooks into Claude/MCP configuration files, ensuring it survives reboots and re-executes when an IDE launches. This also aligns with T1546.001 for Claude/MCP configurations.

Data exfiltration primarily happens by automatically creating GitHub repositories to host the stolen secrets via GitHub Actions. There's also secondary exfiltration to api[.]anthropic[.]com/v1/api, which is likely camouflage to blend in with legitimate AI API traffic.

The Impact: Why Scientific Computing is a High-Value Target

The practical implications of the Shai-Hulud PyPI packages attack are profoundly significant, as an attacker gaining this level of access could forge tokens for any tenant within the compromised environment. For scientific researchers and developers, whose work often involves sensitive data and complex computational workflows, this could mean:

  • Compromised Research Data: Access to cloud credentials or SSH keys could directly lead to unauthorized access to sensitive research datasets, high-performance computational clusters, or proprietary scientific models. This isn't merely about data theft; it encompasses the potential for data manipulation, exfiltration of intellectual property, and even the sabotage of ongoing research projects. Imagine genomic data, patient records, or novel drug discovery algorithms falling into the wrong hands, leading to severe ethical breaches, competitive disadvantages, or flawed scientific outcomes. The integrity and confidentiality of scientific endeavors are directly threatened.
  • Supply Chain Poisoning: Stolen publishing tokens represent a critical vulnerability. Attackers could leverage these tokens to push more malicious updates to other legitimate packages maintained by the same developer. This creates a dangerous worm-like propagation, where trusted software becomes a vector for further compromise, extending the reach of the Shai-Hulud campaign far beyond the initial 19 Shai-Hulud PyPI packages. Such secondary compromises can be incredibly difficult to detect and remediate, leading to widespread and persistent threats across the open-source ecosystem.
  • Loss of Trust and Reproducibility: When scientific packages, often peer-reviewed, widely cited, and foundational to research, are compromised, it severely damages confidence in the entire open-source ecosystem. Scientific progress relies heavily on reproducibility and trust in shared tools. A breach like Shai-Hulud undermines this foundation, making researchers hesitant to adopt new tools or even question the validity of past results obtained using compromised software. This impact is amplified when advanced techniques, such as those demonstrated by the Shai-Hulud variant, expose deep-seated vulnerabilities.

Compromising a GitHub token directly jeopardizes the integrity of scientific work and its infrastructure. Researchers frequently handle specialized, high-value data and computational resources, making their credentials highly attractive targets for sophisticated threat actors. The unique nature of scientific computing environments, often involving large datasets and distributed collaboration, makes them particularly susceptible to such targeted attacks, as demonstrated by the Shai-Hulud PyPI packages incident.

The Response: Immediate and Long-Term Actions

The immediate aftermath of a sophisticated compromise like the Shai-Hulud attack necessitates a comprehensive and swift response. First and foremost, a full credential rotation is paramount. Every GitHub token, cloud API key, SSH key, and publishing token potentially exposed must be immediately invalidated and replaced. This is critical because attackers can forge tokens for any tenant within the compromised environment, potentially expanding their foothold. Concurrently, all development environments that interacted with the affected Shai-Hulud PyPI packages require restoration from known-good backups to eliminate any lingering malicious artifacts or persistence mechanisms. Organizations must also actively hunt for Indicators of Compromise (IOCs), specifically looking for Python packages with executable .pth startup hooks, unexpected downloads of the Bun JavaScript runtime from GitHub, and unusual process chains where Python launches Bun to execute _index.js. Beyond these technical immediate actions, vigilance for other unusual files, network connections, or process behaviors is crucial, often requiring deep forensic analysis.

Looking beyond immediate fixes, a strategic and proactive reassessment of supply chain security is crucial, particularly for scientific communities and open-source projects that may lack dedicated security teams. A fundamental baseline involves rigorously pinning dependencies to specific, immutable versions and implementing cryptographic hash checks for integrity. While this won't prevent initial compromise, it significantly hinders the unnoticed insertion of malicious updates and provides a strong defense against subsequent attacks. CI/CD pipelines must be hardened to adhere strictly to the principle of least privilege; for instance, a build token authorized to publish to PyPI should never possess administrative access to sensitive cloud environments. Mandatory multi-factor authentication (MFA) for all developer accounts and publishing platforms is also highly recommended as a foundational security control. Furthermore, targeted developer education, especially for scientific fields, is crucial. Developers in these domains may sometimes prioritize rapid scientific output over stringent security practices, a tendency that sophisticated attackers like those behind Shai-Hulud PyPI packages can readily exploit. Finally, ecosystem-level security measures from platforms like PyPI and npm require continuous investment in automated scanning, behavioral analysis, and expedited takedown processes to protect the broader community from evolving threats.

Proactive security audits, regular vulnerability assessments, and participation in threat intelligence sharing platforms are also vital. By continuously monitoring for new attack vectors and sharing insights, the scientific and open-source communities can collectively build more resilient defenses against future Shai-Hulud PyPI packages variants and similar supply chain threats. This layered approach, combining immediate incident response with long-term strategic security enhancements, is the only way to safeguard the integrity of scientific computing.

<img src="

Stylized network diagram showing a security breach with broken red lines.
Stylized network diagram showing a security breach
" alt="Network diagram showing a security breach">
Network diagram showing a security breach.

This Shai-Hulud variant, with its novel use of the Bun runtime and targeted approach to scientific PyPI packages, underscores a critical evolution in supply chain attacks. The precision in credential exfiltration and persistence mechanisms demands a proactive and technically granular defense. Organizations must move beyond generic security postures to implement specific controls that address the sophisticated, multi-stage nature of these threats, particularly within specialized development ecosystems.

Daniel Marsh
Daniel Marsh
Former SOC analyst turned security writer. Methodical and evidence-driven, breaks down breaches and vulnerabilities with clarity, not drama.