Sourcegraph’s Shocking Screwup: Private Secrets in Public Repo

A lemur stares back at you, with a shocked expressionAI source code navigation LLM leaks PII after DevOps SNAFU.

Sourcegraph’s large language model was hacked last week. Scrotes labored for days to make its paid-for functionality available free of charge.

Some PII might have leaked, too. The company isn’t sure, but it was accessible. In today’s SB Blogwatch, we check our GitHub repos (yet again).

Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: R.I.P. Steve Harwell.

Credentials Create Crisis

What’s the craic? Sergiu Gatlan reports—“Sourcegraph website breached using leaked admin access token”:

User base exceeding 1.8 million
AI-powered coding platform Sourcegraph revealed that its website was breached … using a site-admin access token accidentally leaked online. … An attacker used the leaked token on August 28th to create a new site-admin account and log into the admin dashboard of the company’s website.

During the incident, the attacker gained access to Sourcegraph customers’ information, including license keys, names, and email addresses. … With a global user base exceeding 1.8 million software engineers, Sourcegraph’s client roster includes high-profile companies like Uber, F5, Dropbox, Lyft, Yelp, and more.

AWS Builder Community Hub

How could this happen? Dan Goodin gets déjà vu—“We’ve said it before; we’ll say it again: Don’t put credentials in publicly available code”:

Stored only on restricted servers
The hacker gained administrative access by obtaining an authentication key a Sourcegraph developer accidentally included in a code published to a public Sourcegraph instance hosted on Sourcegraph.com. … The hacker used the token to elevate their account privileges. … The access token appeared in a pull request posted on July 14, the user account was created on August 28, and the elevation to admin occurred on August 30.

The inadvertent posting by developers of private credentials … has been a problem plaguing online companies for more than a decade. These credentials can include private encryption keys, passwords, and authentication tokens. In the age of publicly accessible code repositories like GitHub, credentials should never be included in commits. Instead, they should be stored only on restricted servers.

What is Sourcegraph? How did the firm find out? Gurubaran KS outlines the answers—“AI Coding Platform Sourcegraph Breached”:

Sourcegraph is a code AI platform that makes it easy to read, write, and fix code–even in big, complex code bases. … On August 30, 2023, Sourcegraph’s security team detected a substantial increase in API usage.

Horse’s mouth? Sourcegraph’s head of security, Diego Comas, has this “Security update”:

Customers’ private data
On August 30, 2023 our team noticed a significant increase in API usage and began investigating the cause. … The spike in usage was ruled as isolated and inorganic.

Our internal control systems, including automated code analysis, failed to catch the access token being committed to the repository. … Expanding our secret scanning through additional static analysis tests will ensure we can better detect and prevent this kind of leak in the future.

Customers’ private data or code was not viewed during this incident. [They] reside in isolated environments and were therefore not impacted by this event. … No other customer info, including private code, emails, passwords, usernames, or other PII, was accessible.

What caused the spike? Nir Valtman clarifies—“Secrets in source code”:

Running joke
The malicious user … created a proxy application which was able to make direct calls to Sourcegraph’s APIs and their underlying … large language model. … The application created by the malicious user, which granted free access to Sourcegraph APIs, generated 2 million views within hours. As the app spread, users generated free Sourcegraph accounts and were able to access Sourcegraph APIs.

There is a running joke among application security leaders that, “We don’t have any secrets in our source code.” [It] is always said with a sly smile. It’s a problem that has existed for a while and continues to wreak havoc on organizations who experience a breach or lead due to exposed secrets.

Secrets in code, though? dspillett agrees with Dan and Nir:

The way to go
Our regular reminder to try keep credentials and other security tokens well away from any source code wherever possible, even if that might mean making things a touch less convenient. … Most of us have checked in or otherwise posted a credential at some point in our careers. I’ve certainly done it in the past with an application DB connection string.

Being careful isn’t the solution, because mistakes will always happen. Making it damn near impossible to accidentally post credentials is the way to go.

What should devs do? MrMorden lays down the lore: [You’re fired—Ed.]

1. Credentials should only be stored in a system designed to store them. …
2. If your system is deployed to a cloud provider, you should be using IAM roles/MSI/&c.
3. You do not ever need to put credentials in a source code repository.
4. If you think you may need to put credentials in a source code repository, see #3.
5. If you’re having trouble with #3, read it again, Scarfolk style.

Still, good job private information wasn’t leaked, eh? 93 Escort Wagon ain’t buying it:

It’s amazing how that is always what companies confidently state when a breach is first announced. Then, over the subsequent weeks and months (and occasionally even years), we gradually hear amendments.

“A small number of encrypted passwords may have been collected, but there’s no way for the hacker to access the actual password.” Followed by, “For a few accounts, the hacker may have been able to access social security numbers and plain text passwords. But that was only for some very old accounts.”

Then, eventually, “The hacker pretty much got unfettered access to every user’s account password, social security number, childrens’ names and schools, spouses medical histories. All in clear text.”

Did you spot the pachyderm in the parlor? dragande does:

AI? I wonder if they used their own product to spot the bug and fix the code.

Meanwhile, metadat channels Phineas T. Barnum:

This could actually be good for Sourcegraph in the medium to long-term: Free publicity!

And Finally:

As Smash Mouth’s manager said, “Steve lived a 100% full-throttle life. Burning brightly across the universe before burning out.”

Previously in And Finally


You have been reading SB Blogwatch by Richi Jennings. Richi curates the best bloggy bits, finest forums, and weirdest websites … so you don’t have to. Hate mail may be directed to @RiCHi, @richij or [email protected]. Ask your doctor before reading. Your mileage may vary. Past performance is no guarantee of future results. Do not stare into laser with remaining eye. E&OE. 30.

Image sauce: Edgar.infocus (via Unsplash; leveled and cropped)

Richi Jennings

Richi Jennings is a foolish independent industry analyst, editor, and content strategist. A former developer and marketer, he’s also written or edited for Computerworld, Microsoft, Cisco, Micro Focus, HashiCorp, Ferris Research, Osterman Research, Orthogonal Thinking, Native Trust, Elgan Media, Petri, Cyren, Agari, Webroot, HP, HPE, NetApp on Forbes and CIO.com. Bizarrely, his ridiculous work has even won awards from the American Society of Business Publication Editors, ABM/Jesse H. Neal, and B2B Magazine.

richi has 525 posts and counting.See all posts by richi