Software composition analysis (SCA) is the analysis of your project or product’s source code to identify the component structure of the software, also known as its dependency graph. As discussed, components may be standalone components or they may be code snippets. The code of a component may be owned by you or by a third party, then called third-party code. Open source code is the most prominent example of third-party code.
The main motivation for software composition analysis, originally, was to ensure license compliance. Any third-party code is legally separate code that comes with its own licenses. You need to comply with these licenses when you are delivering your projects to clients and your products to customers.
Legally separate does not necessarily mean technically separate. Most notably, source code snippets that have been copied into your source code or into your dependencies are legally separate code components, even though they are embedded in your or third-party code. You still need to identify these snippets, even in your dependencies, if you want to deliver a license-compliant software.
Software composition analysis is typically performed using specialized tools. These tools read through the whole source code base of the software and try to identify any third-party code. A SCA tool needs access to the full source code, so in addition to providing your original code, you also have to either download the dependencies yourself or you have to direct the tool how to do so.
Examples of open source SCA tools are FOSSology, a complete solution, and ScanCode, a focused scanning tool to be embedded into a larger custom tool chain. In addition, there are many commercial tools on the market.
To a SCA tool, the software consists of a hierarchical folder structure with files and code snippets in files. Source code outside this folder structure is not considered. A SCA tool does not and should not make assumptions about the folder structure mapping to the dependency graph in a particular way. As the result of a software composition analysis, you will be presented with the folder, file, and snippet structure rather than the dependency graph. An export of this information in software bill of materials (SBOM) form will provide a flat list (rather than a graph).
Software composition analysis is not a fully automatic process. Existing SCA tools will analyze the source code and present their findings to their users for sign-off. The key findings presented to users are:
- Component identification. For a given software component, a SCA tool will suggest a specific origin component, ideally using a unique component identifier like a package URL (PURL).
For a given code snippet, a SCA tool will also suggest the origin component and to this add the location of the source code within the component that the snippet may have been copied from.
- Legal information. Originally designed for license compliance, SCA tools will try to determine the component’s legal information: Which licenses, which copyright holders, any other notices that a user needs to know about.
- Vulnerabilities. More recently, SCA tools started adding known vulnerabilities information, though often this is considered a follow-on step in a tool chain and not part of software composition analysis.
In addition to source code analysis, binary analysis tools let you analyze the software composition of binary files. Binary files can be found anywhere: They might be hiding in a source code folder or be part of a container image. Like source code, they need to be found, identified, and analyzed.