A software bill of materials (SBOM) captures which code components are included in a software. There are two original uses:
- The first use of the SBOM information is to ensure that only code components were included that both the developer and any recipient would find acceptable; most notably developers generally prefer to keep copyleft-licensed components out of their products.
- The second use of this information is to create proper legal notices for the third-party code in the software. A developer, when distributing the software, has to provide these legal notices about the included open source components to comply with their licenses.
Customers in a supply chain often make the provision of an SBOM a purchasing requirement, as discussed before. Governments have followed suit, mostly driven by the need to make software more secure.
A report by the U.S. government’s Department of Commerce details basic requirements for an SBOM.[1] Any SBOM should name its author and the time it was created. Each component (material) in an SBOM should provide the component’s name, its version number, and the supplier of the component. Interestingly, the report also states that the component should list its relationship to other components, which I would have considered helpful but not critical.
The report sees SBOMs as hierarchical structures. At the root is the SBOM for the software being described. The components in the SBOM can then have their own SBOM, potentially creating a hierarchical structure. You cannot, however, map the dependency graph into a hierarchy, at least not without creating significant redundancy; I argue that the components in an SBOM should simply be captured as a flat list; if preserving the dependency graph is important, each component can reference the components it depends on.
Also, an SBOM should be machine-readable for automated processing. The report lists SPDX, CycloneDX, and SWID tags as established format specifications for capturing SBOM information. The report notes that the industry so far has failed at providing unique identifiers for components and that supplier and component name should therefore be human readable, for human interpretation, but not necessarily machine-interpretable.
The grassroots purl (short for “package URL”) effort is offering help to uniquely identify components.[2] The supplier of the component and its name (and version number etc.) are encoded into one heterogeneous name value, the purl. It consists of seven components structured using the following syntax:
scheme:type/namespace/name@version?qualifiers#subpath
While not directly a traditional URL, a purl uniquely nevertheless identifies a location. The location then becomes the supplier of the component. Therefore, identical copies of the same code base in different locations are treated as different components.
An SBOM that fulfills these basic requirements can already be delivered with the software to its users to fulfill a purchasing requirement.
That said, there are many more types and uses of SBOMs.
[1] See The Minimum Elements For a Software Bill of Materials (SBOM)
[2] See Package URL specification