Data leakage is a daily occurrence no matter how developers or companies try to avoid it. GitHub is used by millions of people to host and share the code. The GitHub repository contains sensitive information, such as secret keys, passwords, confidential files, etc.
According to research carried out by North Carolina State University, between October 2017 and April 2018, over 500,000 secrets were uploaded to GitHub. Studies show that more than half of company data breaches that usually occur are a result of credential hacking. This is where GitGuardian comes in.
GitGuardian was founded in 2017 by Erick Fourrier Jeremy Thomas- applied mathematics graduates and software engineers who specialize in data science, AI, and machine learning.
This Paris-based cybersecurity company uses a combination of algorithms, including pattern matching and machine learning to look for signs of company secrets in online code. According to the startup’s figures, more than 3,000 secrets find their way online daily. Developers can make the most out of git hooks in detecting secrets in source codes.
How does Cloud Fuels Secret Leakage?
One of the many factors that contribute to sensitive data leakage could be how software developers are increasingly relying on third-party services. To incorporate such services, developers usually juggle hundreds of credentials with changeable sensitivity, from private cryptographic keys for servers to API keys that are used to provide mapping features on websites.
In the process of handling these integrations, millions of developers, businesses, and organizations globally use GitHub- a public platform that allows developers to share code and work collaboratively on projects.
Either accidentally [which is usually the case most of the time], or seldom knowingly, these uploads have companies’ secrets buried within them together with the code that is being developed. Hackers can scour this code theoretically, hack the company’s accounts and steal credentials all without the developer and their client knowing.
How can GitGuardian Help Avoid These Leaks?
GitGuardian’s technology works by first linking developers who are GitHub members to their respective employers or companies. This gives the company a great insight into who their developers are on GitHub. It also helps the companies to see the levels of public activity their developers are involved in. This is particularly crucial for developers’ personal repositories since they are completely out of their companies’ control.
Once developers have been linked, GitGuardian’s algorithms scrutinize any code changes, referred to as commits, made by developers in real-time, looking for signs of company secrets. These signs can range from code patterns to file types that have been found to contain credentials in the past.
Once a leak occurs, it takes just four seconds for the GitGuardian to detect it and notify both the developer and their security team. Generally, the information is removed within 25 minutes and the credential revoked in less than an hour.
For every alert sent by the GitGuardian, the developers and security team need to rate the accuracy of the detection; whether the company’s secrets were actually exposed or if it was just a false alarm.