Software composition analysis is a tool-based process that cannot be fully automated. A SCA tool expects or downloads a hierarchical structure of all relevant artifacts. Typically, this is a folder hierarchy of source code files pulled from version control, but it can also be container images with nondescript binary files included.
Different SCA tools naturally provide different functionalities. At its core, however, there are three different types of source code analysis results that a SCA tool might provide to its users (not all tools do).
- The dependency graph. A core output is the actual dependency graph (not just the SBOM). This includes both correctly identifying components and their linkage (forming the components into a graph).
Modern package managers have made it easy to determine a software’s dependency graph, but many older software systems written in languages without established package managers resist any automation of creating the dependency graph.
Package managers help SCA tools identify a component. The meta data provided by package managers, for example, component licenses and owners, is more often incorrect than not.
- Meta-data from source code analysis. Another core output of a SCA tool is the analysis of the source code. Most commonly, SCA tools look for legal information to help users ensure license compliance.
Identifying legal information is commonly performed in a simple and straightforward way by using regular expression matching against defined terms and databases like license text databases.
Code quality analysis and identifying unknown vulnerabilities are also useful analysis functions available in some tools.
- Snippet matching of your and third-party source code. The final core output of some SCA tools is the identification of code snippets that may have been copied from the web into your or any third-party code, including open source components.
Free-to-use open source SCA tools usually don’t offer a snippet matching feature, because to perform this function, the tool needs to compare any code snippet against the whole wide world of third-party code. This requires the creation and continuous updating of a large database of such third-party code, which can become rather expensive.
The SCA tool works through the artifact hierarchy and collects its findings for review and sign-off by its users. It is not advisable for users to just accept what a SCA tool is suggesting. More often than not, the findings will be wrong. To this end, SCA tools provide users with a workflow in which they can review each finding for correctness.
There are many challenges to a human review:
- Erroneous data. A SCA tool may pull in erroneous data, for example, from package managers. Users need to review and correct this data.
- Laborious process. The developers of a SCA tool typically don’t want to be on the hook for overlooked third-party code. Hence a SCA tool is set to be highly sensitive, often suggesting third-party code, in particular copied and pasted snippets, where there is none. This leads the tool to declare a large number of findings, many of which if not most of them will be false positives. Working through all these findings is a significant time sink for SCA tool users.
- Error-prone process. The review process is highly error-prone, because it is mind-numbingly boring. Reviewers have to work through a large set of findings, many of which are similar and repetitive, yet may vary in minor but important details. As humans work, attention may wane and a desire to move forward will get its way, leading to sloppy work and ultimately errors in the analysis and review process.
- Expensive review. The review is often delegated to the original developers who would rather be writing new code and shipping features than reviewing old code and cleaning up legal debt. Using your developers to review SCA tool findings is rather expensive labor, and better be delegated to third parties. My own company, Bayave GmbH provides such review services at competitive prices.
Creating a dependency graph and deriving the SBOM for the first time therefore often is a laborious, expensive, and fraught-with-errors process. Ideally, changes to your project and product only lead to an incremental adjustment of the dependency graph and SBOM data.
© 2024 Dirk Riehle, used with permission.
Next up: Basic SBOM requirements