GitHub Beefs Up Code Scanning With Semmle

Securing software is hard enough to begin with, but the challenges are compounded for open source projects. Even when issues are fixed, those updates don't reach all the users of the application. Microsoft’s acquisition last week of code analysis platform provider Semmle and plans to integrate the engine into GitHub’s code scanning process will help project owners and developers know about potential problems in their codebase.

The world runs on software: it powers vehicles, runs medical devices, delivers entertainment, and makes communication possible, and that is just a short list. What isn’t always evident, is the extent these applications rely on open source libraries and frameworks. And when there are issues in these open source components, the effects are widespread.

“Today 99% of all software projects consume open source,” wrote Github’s senior vice-president of product Shanku Niyogi. While that is something to celebrate, he also warned that “the security lifecycle is broken.”

Developers integrating packages into their code so that they don’t have to write code for functionality someone else has already built is one of the good parts of modern software development. The not-so-good part arises when issues are found in that component. The package maintainers don’t always know who else is using their library, so it is hard to notify all the downstream developers that a problem has been fixed. Since libraries can be nested inside other libraries, a developer may see an advisory about updating a particular library, and not realize that a different library he or she is using includes that vulnerable one. Or someone can fix the vulnerability in the component and never let the project maintainers know. That specific project is secure, as is anyone else using that fixed version, but all the other projects are left in the dark.

Of course, all this assumes that someone is looking at the open source code components and identifying vulnerabilities in the first place. In some projects, a community of contributors help find and fix issues. Other projects don’t have enough people to devote time hunting for bugs, typically a manual and time-intensive process, so it gets pushed further down the to-do list.

Finding Flaws

Security vulnerabilities are discovered through penetration testing and manual code reviews, as well as through code scanning and fuzzing tools. It's a time-consuming process, and also requires a bit of serendipity. There are many software projects, and the researcher has to decide which package to analyze.

Microsoft plans to integrate Semmle's code analysis engine into GitHub to make it easier for researchers to find vulnerabilities in projects. QL treats code as data, and lets researchers use an object-oriented query language to ask questions about the codebase. Regardless of the language, many vulnerabilities are the result of the same kind of code mistake. Since the query language isn’t language-specific—it isn’t specifically looking for a code pattern in Python or Java, for example—the engine can find variations of the mistake across different languages.

“Security researchers identify vulnerabilities and their variants with a QL query. This query can be shared and run over many codebases, freeing up security researchers to do what they love and do best: hunt for new classes of vulnerability,” Niyogi wrote.

As a new CVE Numbering Authority for open source projects, GitHub will also be able to issue CVEs for security advisories opened on the platform. CVEs are used to share details about a software vulnerability, and the CNA community is involved in discussions around how vulnerability research and disclosure should work.

Fixing Flaws

Over the years, GitHub has rolled out security tools and features such as the Dependency Graph to help project maintainers and developers identify and fix issues in their code. The company scans public repositories and (private ones that project maintainers have opted into) generates a dependency tree by mapping which components are being used by which project. GitHub compares the dependency name and version number against a list of vulnerability reports, which may have been sent directly by the people who found the issue, or through sources such as the National Vulnerability Database, MITRE, and WhiteSource, to find affected projects.

The Dependency Graph drives the Security Alerts feature, where repository owners automatically are notified about any vulnerabilities in any of the dependencies used by their projects. Originally launched for JavaScript and Ruby, the feature now supports Python, Java, and .NET code. The latest addition to the Security Alerts: PHP projects that use the package manager Composer.

It's one thing to give alerts, but another to apply the fixes. Dependabot (an earlier acquisition) provides automatic security fixes natively within GitHub. When a vulnerability is found in a dependency, GitHub automatically issues a pull request on downstream repositories with the information needed to accept the patch.

“Software security is a collective problem, a responsibility that involves producers and consumers of code, open source maintainers, security researchers, and security teams,” Niyogi wrote. “Open source must be something that the world can trust.”

Application Security Open Source Vulnerability