SHELDON
  • Insights
  • About

Table of contents

  • Contract Details
    • An Overview of the TDA Methodology
    • The Baseline: Analyzing the Pre-Intervention Portfolio
      • Methodology: Charting the Primordial Soup
      • The Two Analytical Lenses
      • The Engine: What is Growing Neural Gas (GNG)?
      • Visualizing the Workflow
      • Further Insights from the Source Document
      • Deconstructing the Analysis: A Holistic, Data-Driven View
      • Key Questions This Analysis Answers
    • Next Steps & Future Work
      • Conclusion: The Strategic Choice Ahead
    • Appendix: TDA Examples & Further Reading
      • Sports Analytics
      • Finance
      • General Tutorials & Python Libraries

Topological Data Analysis: Growing Neural Gas

Understanding and Analyzing the AFWERX Portfolio by Company Type

AFWERX
RGI
Portfolio Analysis
GNG
TDA
Author

SHELDON

Published

August 13, 2025

AFWERX Portfolio Analysis with TDA and GNG
A review of prior work on Topological Data Analysis (TDA), specifically the Growing Neural Gas (GNG) methodology. This document explores how GNG was used to analyze the AFWERX portfolio based on the ‘type’ of company prior to AFWERX intervention.

Contract Details

Contract Number: FA228024C0012
Contractor Name: P.W. Communications, Inc.
Contractor Address: 11200 Rockville Pike Suite 130 Rockville, MD 20852

An Overview of the TDA Methodology

This document reviews a powerful data analysis technique called Topological Data Analysis (TDA). Based on the principles outlined in prior work, the core idea is to move beyond simple spreadsheets and create a “shape-based” map of the AFWERX portfolio. This approach reveals the deep, underlying relationships between companies, showing us not just who they are, but how they fit together into a larger strategic landscape. It helps us see the natural clusters, the unique outliers, and the unexplored gaps in our innovation ecosystem.

The power of this method comes from a series of deliberate, strategic choices made before the analysis begins. The final map is a direct reflection of the questions we ask and the data we use. The key decisions involved are:

  • Variable Selection: The process begins by carefully choosing which company characteristics to include. Every feature, from financial data to team size, defines the universe that the map will chart. This step is critical, as including or excluding certain data points can fundamentally change the resulting landscape.
  • Data Scaling: To ensure fair comparisons, all variables are scaled to a common standard. This prevents a single metric, like total funding, from overpowering all other factors, allowing for a more nuanced and balanced comparison of companies.
  • The Normalization Choice: This is the most critical strategic decision. We can view the data in two ways:
    1. Unnormalized (“The World That Is”): This shows raw performance and the current, real-world state of the portfolio.
    2. Normalized (“The World As It Might Be”): This adjusts the data, for instance by looking at performance per employee or per dollar. This reveals hidden potential and efficiency, highlighting companies that could excel with more resources.

Ultimately, TDA is not an automated answer machine; it is a tool for enhanced strategic thinking. The “shape” that emerges from the analysis is a mirror, reflecting the intent and wisdom of the questions asked. It provides a framework for understanding our portfolio not as a list of assets, but as a dynamic, interconnected ecosystem.

The Baseline: Analyzing the Pre-Intervention Portfolio

A crucial distinction of the analysis reviewed here is its specific temporal focus: the data evaluates companies at the moment before they received their first AFWERX award. This is not a study of what companies became with our help, but rather a foundational map of what they were at the instant of their selection. This approach establishes a critical “Genesis Block”—a baseline against which all subsequent performance and impact can be measured.

The primary goal was to understand the raw material we started with. What did the innovation landscape from which we selected our initial awardees truly look like? Were we choosing from dense clusters of technologically similar companies, or were we identifying unique pioneers from the outset?

Methodology: Charting the Primordial Soup

To answer these questions, we employed Topological Data Analysis (TDA), an unsupervised machine learning framework. TDA excels at creating a “shape-based” map of data, revealing the natural groupings and hidden relationships within a complex portfolio. By using an unsupervised method, we avoided imposing our own biases and instead allowed the data itself to reveal the fundamental structure of the pre-award innovation ecosystem.

A vital step in this process is data preparation and scaling. To create a fair and unbiased map, every characteristic of a company—from its patents and team size to its industry and technology type—was converted into a universal, numerical language. The data was then scaled, preventing any single metric (like total prior funding) from dominating the analysis. This ensures we are comparing companies based on their intrinsic nature and capabilities, not just their surface-level metrics. This process is the bedrock of any credible impact analysis, as it provides an honest, unvarnished measurement of what each company was before AFWERX intervention.

The Two Analytical Lenses

This analysis provides two distinct lenses for viewing the portfolio at that foundational moment:

  1. The Unnormalized View (“The World That Was”): This is the direct historical record. It shows the landscape as it existed, highlighting the companies that were the largest, most established, and best-funded before receiving their first AFWERX dollar. It is a map of the existing reality we chose from.

  2. The Normalized View (“The Potential We Saw”): This is the scout’s map. For this view, we adjusted the data to look at metrics like performance per employee or innovation per dollar of funding. This lens reveals latent potential, elevating the efficient, scrappy innovators who might otherwise be overshadowed by larger competitors.

By examining our past selections through these two lenses, we can analyze the nature of our initial decisions. Did we favor established players, or were we betting on companies with the highest latent potential? Understanding our historical selection strategy is the most powerful tool we have for refining our future investment decisions and maximizing the impact of the AFWERX program.

The Engine: What is Growing Neural Gas (GNG)?

If TDA provides the philosophical framework for our analysis (understanding the “shape” of data), then the Growing Neural Gas (GNG) algorithm is the engine that actually builds the map. GNG is a type of artificial neural network that is particularly well-suited for learning and visualizing the topological structure of a dataset.

Think of it as a flexible net thrown over the data points (our companies). The net starts small and simple, then iteratively “grows” and adapts:

  1. Nodes are placed among the data points. These nodes represent clusters or archetypes of companies.
  2. The net learns by moving its nodes closer to the dense areas of the data.
  3. Connections are formed between nodes that are close to each other, creating the “shape” or topology of the portfolio.
  4. The net grows, adding new nodes in areas where the data is not well-represented, ensuring that even small, unique clusters of companies are captured.

The end result is a network graph where the nodes are archetypal companies and the connections represent their similarity. This graph is the TDA map we use to analyze the portfolio, allowing us to see the clusters, outliers, and empty spaces in our innovation landscape.

Visualizing the Workflow

To make this process more concrete, consider the following workflow diagram:

flowchart TD
  A["Raw Company Data\n(e.g., funding, team size, industry)"] --> B["Data Preparation\n- Numeric Conversion\n- Scaling / Normalization"]
  B --> C{"Growing Neural Gas Algorithm"}
  C --> D["Topological Map\n(Network Graph)"]
  D --> E["Analysis & Insights\n- Identify Clusters\n- Find Outliers\n- Discover Gaps"]

Further Insights from the Source Document

For a more foundational understanding of this methodology, readers are encouraged to consult the original source document. The link provides additional details and illustrative examples that are key to grasping the nuances of the TDA process. Specifically, the source document elaborates on:

  • The GNG Engine in Detail: The source provides a dedicated explanation of the Growing Neural Gas algorithm, complementing the summary in this document.
  • The Art of Feature Engineering: A key insight from the source is the concept of breaking down composite variables to create a richer analysis. For example, instead of using a single metric like a basketball player’s “total points,” one can create a more descriptive map by using its components: points from 2-pointers, 3-pointers, and free throws. This same logic can be applied to portfolio analysis.
  • Illustrative Analogies: The source uses clear, non-military examples—analyzing cars and basketball players—to explain the critical importance of variable selection and normalization. These analogies provide an intuitive grasp of the concepts before applying them to complex defense and company data.

The full text can be reviewed here: https://www.sheldon-insights.com/deliverables/nestt/clin003/afwerx/understanding_tda_gng/write_up

Deconstructing the Analysis: A Holistic, Data-Driven View

The collection of materials from the AFWERX presentation offers more than just data; it provides a masterclass in constructing a strategic narrative. To understand its full impact, one must view it not as a series of slides, but as a deliberate, multi-layered journey designed to lead leadership from high-level observation to granular, actionable insight. The full exhibit list can be reviewed here: https://www.sheldon-insights.com/deliverables/nestt/clin006/afwerx/afwerx_presentation/analysis_materials/

The narrative of the analysis unfolds in a logical, cascading sequence, with each layer building upon the last with concrete data:

  1. Establishing the Framework (The “Who” and “How”): The analysis first establishes its core lexicon. This is the foundational layer that makes the rest of the discussion possible.

    • The Two Views: It defines the two lenses for evaluation: Prior Performance (a company’s historical baseline) and Subsequent Performance (their growth after AFWERX engagement).
    • The Four Quadrants: It uses these views to segment the portfolio into four distinct, data-driven archetypes: Mature DAF/DoD Suppliers, New DoD Participants, Multiple Award Winners (MAWs), and Subcontractors. This framework moves the conversation from “companies” to “types of companies.”
  2. The 30,000-Foot View (The “What”): With the framework in place, the analysis begins with high-level summaries. The “SUMMARY TABLE” and “VISUAL BREAKDOWN” exhibits provide the essential executive overview. For instance, they establish the baseline reality that Mature DAF/DoD Suppliers enter the program with a median of $1.3M in prior obligations, while New DoD Participants start at nearly $0. This immediately quantifies the vast difference in experience between the quadrants.

  3. Introducing the Time Dimension (The “When”): The analysis then deepens by introducing the element of time. The “SUBSEQUENT VS PRIOR GROUP ANALYSIS” moves beyond a static snapshot to reveal the portfolio’s dynamics. It shows that in the two years following their AFWERX award, the New DoD Participants quadrant demonstrates explosive growth, achieving a median of $450k in new obligations. This directly contrasts with the Mature DAF/DoD Suppliers, whose subsequent obligations are a fraction of their prior work, illustrating a fundamentally different engagement model.

  4. The Diagnostic Deep Dive (The “Why”): The narrative then zooms in to diagnose the reasons behind these trends. The “KEY METRICS FOR…” exhibits reveal the underlying mechanics. We see that the growth for New DoD Participants is not just from small awards; their median subsequent award size is $150k, indicating they are successfully landing significant, meaningful contracts. This is a critical diagnostic insight when compared to the MAWs, whose median subsequent award size might be smaller, suggesting a different transition challenge.

  5. Grounding in Reality (The “Who, Specifically?”): Finally, after establishing the archetypes and their quantified dynamics, the analysis grounds itself in concrete reality with the “DEEPER DIVE, FPDS PHASE IIIS BY COMPANY” exhibit. It moves from the abstract median of “$450k in new obligations” to showing the specific companies, like “Company X with a $2.5M Phase III,” that make up that number. This provides tangible examples of success, making the data real and relatable, and showing what a successful transition looks like in practice.

By structuring the analysis in this holistic way—from a qualitative framework to a high-level quantitative overview, and then to time-based dynamics, diagnostic metrics, and specific examples—the presentation creates a powerful and persuasive narrative. It equips leaders not just with data points, but with a comprehensive, evidence-based model for understanding their portfolio.

Key Questions This Analysis Answers

This methodology is not just an academic exercise; it is designed to provide answers to concrete strategic questions about the AFWERX portfolio. By creating a topological map of our pre-award companies, we can address the following:

  • Portfolio Clustering: Where are the natural “continents” or clusters of technology in our portfolio? Are we heavily invested in one area (e.g., data analytics) and lightly in another (e.g., advanced materials)?
  • Identifying Outliers: Which companies are true “islands” on the map? These outliers may represent genuinely novel technologies or unique business models that defy conventional categorization.
  • Discovering Gaps: What does the “empty space” on our map tell us? These gaps could represent critical technology areas or company types that we are currently missing, highlighting opportunities for future targeted outreach.
  • Understanding Selection Bias: Does our selection process favor companies from specific regions of the map? For example, does the “normalized” view show that we are successfully identifying high-potential companies, or are we primarily selecting established players?
  • Informing Future Strategy: How can the shape of our past portfolio inform the desired shape of our future portfolio? The map can be used as a tool to guide future solicitations and investment strategies to build a more resilient and diversified technology base for the Air Force.

Next Steps & Future Work

This baseline analysis of the pre-award portfolio is the first step in a larger journey of data-driven portfolio management. The foundational map we have created enables several critical future analyses:

  • Measuring Impact Over Time: By taking snapshots of the portfolio at regular intervals (e.g., annually), we can apply the same TDA methodology to see how the “shape” of our portfolio evolves. This allows us to measure the impact of AFWERX funding and mentorship, tracking how companies move across the map as they mature.
  • Comparative Analysis: We can compare the AFWERX portfolio map to other innovation portfolios (e.g., other government agencies, venture capital firms) to identify our unique strengths and potential blind spots.
  • Connecting to Performance Data: The next logical step is to overlay operational and performance data onto the map. By color-coding the nodes based on metrics like subsequent funding rounds, contract successes, or technology transitions, we can visually identify which regions of our “innovation space” are producing the most successful outcomes.
  • Predictive Modeling: In the long term, this topological map can serve as a foundational layer for predictive models. By analyzing the trajectories of successful companies across the map over time, we may be able to identify leading indicators of success for future applicants.

Conclusion: The Strategic Choice Ahead

This document has reviewed a powerful analytical lens: Topological Data Analysis. We have moved from the abstract theory to the concrete application, showing how this methodology can deconstruct the AFWERX portfolio into a landscape of archetypes, pathways, and quantifiable insights. The final question is not “Was this analysis interesting?” but “What is the strategic imperative of this capability?”

Our primary military competitors build their industrial power through centralized, top-down, five-year plans—an approach that is rigid and predictable. The United States’ core asymmetric advantage is our dynamic, decentralized ecosystem of innovators. This is our greatest strength, but also our most profound challenge. A network we cannot see is a network we cannot strategically cultivate. The critical challenge for AFWERX is to harness the power of our network without crushing it under a bureaucracy that mimics our adversaries’ weakness.

TDA presents a path forward, offering more than just data science; it is a command-and-control interface for a network-based strategy.

  • It serves as a strategic telescope, allowing us to see the invisible structures of our own industrial base and providing a common visual language for leaders, program managers, and data scientists to have a single, evidence-based conversation.
  • It enables AFWERX to move at the speed of relevance. By providing a near-real-time map of the portfolio, it allows our team to act as strategic cultivators—identifying high-potential clusters and providing targeted support—rather than simply managing contracts one by one.
  • It allows us to build a new narrative of inevitable, data-driven superiority. The story is no longer just “we are funding companies,” but “we are building and managing a dynamic, self-organizing industrial base, and we have the tools to see it, shape it, and accelerate it.”

The decision before AFWERX is therefore not a technical one, but a philosophical one. Do we wish to continue managing a list of individual contracts, or do we want to cultivate a strategic innovation ecosystem? If the answer is the latter, then the path forward is clear. We must continue to invest in the tools that allow us to see, understand, and shape the network that is our single greatest asymmetric advantage.


Appendix: TDA Examples & Further Reading

To further explore the concepts and applications of Topological Data Analysis, the following is a curated list of accessible articles, tutorials, and case studies.

Sports Analytics

  • Redefining Basketball Positions with TDA: 13 Positions in Basketball
    • Description: A classic and widely cited example of TDA in action. This analysis of NBA player statistics revealed 13 distinct player roles, moving far beyond the traditional 5 positions. It’s a perfect illustration of how TDA can uncover hidden structures in familiar data.
    • Link: https://www.wired.com/2012/04/analytics-basketball/
  • A Layman’s Introduction to TDA in Sports
    • Description: A less technical overview that uses the same basketball example to explain the core concepts of TDA in a very easy-to-understand manner.
    • Link: https://www.youtube.com/watch?v=k-A2V_igA_s

Finance

  • Detecting Early Warning Signs of Market Crashes
    • Description: This article provides a fascinating look at how the “shape” of stock market data changes in predictable ways before a major crash. It demonstrates how TDA can be used as a monitoring tool for systemic risk.
    • Link: https://science.upd.edu.ph/up-mathematicians-introduce-a-different-approach-in-detecting-potential-stock-market-crashes/
  • TDA for Financial Time Series Analysis
    • Description: A more technical tutorial that walks through the process of applying TDA to time-series data, like stock prices, to identify recurring patterns and anomalies.
    • Link: https://www.numberanalytics.com/blog/advanced-tda-time-series-analysis

General Tutorials & Python Libraries

  • Scikit-TDA: TDA for Machine Learning
    • Description: The official documentation for Scikit-TDA, a popular Python library for topological data analysis. It includes clear tutorials and examples for getting started with practical applications.
    • Link: https://scikit-tda.org/
  • GUDHI: A Comprehensive TDA Library
    • Description: The homepage for the GUDHI library, which offers a wide range of powerful TDA algorithms. Its Python tutorials are excellent for those looking to go deeper into the underlying mathematics.
    • Link: https://gudhi.inria.fr/python/latest/

2025, SHELDON