Simple Docs Extractor - Application Flow

This document provides a comprehensive overview of how the Simple Docs Extractor application works, designed for new developers who want to understand or contribute to the codebase.

Overview

Simple Docs Extractor is a documentation generation tool that extracts documentation from source code files and generates organized markdown documentation with index files. The application follows a modular architecture with clear separation of concerns.

High-Level Architecture

The application is structured around several key components:

Configuration Layer - Builder pattern for fluent configuration
File Processing Layer - File discovery and content extraction
Content Processing Layer - Documentation extraction and transformation
Generation Layer - Output file creation and formatting
Plugin System - Extensible extraction and formatting capabilities

Application Flow

1. Configuration Phase

The application starts with configuration using the Builder pattern:

const scraper = SimpleDocExtractor.create('./src')
  .target(target => {
    target
      .patterns('**/*.ts')
      .cwd('./src')
      .outDir('./docs')
      .createIndexFiles()
      .useDocumentationTemplate('./templates/doc.md');
  })
  .addRecommendedFormatters()
  .buildService();

Key Classes:

Builder - Main configuration builder
TargetBuilder - Configures individual processing targets
TemplateBuilder - Configures templates for different output types

2. File Discovery Phase

Once configured, the application discovers files to process:

// In SimpleDocExtractor.handleTarget()
const files = await this.getFiles(target);

Key Classes:

FileScanner - Uses glob patterns to find matching files
Target - Contains glob options and output configuration

Process:

FileScanner.collect() scans directories using glob patterns
Returns array of absolute file paths matching the criteria
Files are processed sequentially

3. File Processing Phase

For each discovered file, the application processes it through several stages:

3.1 Pre-Processing (`CodeFileProcessor.preProcess()`)

const processedResult = await codeFileProcessor.preProcess(file, target);

Steps:

Content Extraction - Uses DocumentContentExtractor with configured plugins
Template Injection - Injects extracted content into templates via ContentInjection
Formatting - Applies configured formatters to clean up content
Output Path Generation - Determines where the output file should be saved

3.2 Content Extraction (`DocumentContentExtractor`)

The extraction process uses a plugin-based system:

const extractedContentArray = await new DocumentContentExtractor(
  target.plugins ?? []
).extractFromFile(file);

Available Plugins:

TagExtractorPlugin - Extracts content from HTML/XML-like tags (<docs>content</docs>)
RegexExtractorPlugin - Uses regex patterns to extract content
CallbackExtractorPlugin - Custom callback-based extraction
CopyContentsPlugin - Copies entire file content

Process:

Each plugin processes the file content
Plugins return ExtractedContent[] with content and metadata
Results are combined and cleaned (trimmed, empty lines removed)

3.3 Template Processing (`ContentInjection`)

injectedContent = contentInjection.mergeExtractedContentsIntoTemplateString(
  extractedContentArray
);

Process:

Reads template file content
Replaces placeholder strings with extracted content
Applies default text for missing content
Uses TemplateContentExtractionContentMerger for complex merging

3.4 Formatting (`TFormatter`)

for (const formatter of this.config.formatters) {
  injectedContent = formatter({
    filePath: file,
    outFile: file,
    content: injectedContent,
  });
}

Available Formatters:

RemoveMultiLineCommentAsterisks - Removes comment formatting
AddDoubleLinesFormatter - Adds spacing between content lines

4. File Generation Phase

4.1 Documentation File Generation (`DocFileGenerator`)

new DocFileGenerator({
  templatePath: this.getDocFileGeneratorConfig(target).templatePath,
  outDir: transformedOutDir,
}).saveToMarkdownFile(injectedContent, file);

Process:

Creates output directory structure
Converts file extension to .md
Writes processed content to markdown file

4.2 Index File Generation (`MarkdownIndexProcessor`)

After all files are processed, index files are generated:

await new MarkdownIndexProcessor({
  ...this.getIndexProcessorConfig(target),
  recursive: true,
}).handle(target.outDir);

Process:

Directory Scanning - DirectoryMarkdownScanner scans directories for markdown files
Content Generation - IndexFileGenerator creates index content with file listings
Template Application - Applies index templates with search/replace patterns
Recursive Processing - Handles subdirectories if configured

Index Types:

Regular Index - Lists files and folders in each directory
Flattened Index - Shows all files in a flat structure
Root Index - Special index for the root documentation directory

5. Output Structure

The application generates a structured documentation output:

docs/
├── index.md                    # Root index file
├── src/
│   ├── index.md               # Directory index
│   ├── utils/
│   │   ├── index.md          # Subdirectory index
│   │   ├── helper.ts.md      # Generated documentation
│   │   └── validator.ts.md   # Generated documentation
│   └── services/
│       ├── index.md
│       └── api.ts.md

Key Data Structures

ExtractedContent

type ExtractedContent = {
  content: string;                    // The extracted documentation content
  attributes: Record<string, string>; // Attributes from tags (e.g., class="example")
  searchAndReplace: string;          // Placeholder to replace in templates
  divideBy?: string;                 // Optional delimiter for content sections
  defaultText?: string;              // Default text when no content found
};

ProcessResult

type ProcessResult = {
  content: string;           // Generated documentation content
  outDir: string;           // Output directory path
  fileName: string;         // Output filename
  loggableFileName: string; // File path for logging
  locales: Locales;         // Template variables
} | {
  error: string;            // Error message
  noDocumentationFound?: boolean; // Whether documentation was missing
};

Plugin System

The application uses a plugin-based architecture for extensibility:

ExtractorPlugin Interface

interface ExtractorPlugin<Config> {
  setConfig(config: Config): ExtractorPlugin<Config>;
  getConfig(): Config;
  extractFromString(str: string): Promise<ExtractedContent[] | ErrorResult>;
}

Creating Custom Plugins

Extractor Plugins - Extract content from source files
Formatter Plugins - Transform extracted content
Template Plugins - Process template content

Error Handling

The application handles errors gracefully:

File Not Found - Logs error and continues with next file
No Documentation Found - Tracks missing documentation count
Plugin Errors - Supports throwable vs non-throwable errors
Template Errors - Validates template files exist

Configuration Options

Target Configuration

File Patterns - Glob patterns for file discovery
Output Directory - Where to save generated documentation
Templates - Custom templates for different output types
Plugins - Extraction and formatting plugins
Index Creation - Whether to create index files

Template Configuration

Documentation Templates - Templates for individual files
Index Templates - Templates for directory listings
Root Index Templates - Templates for root documentation

Performance Considerations

Sequential Processing - Files are processed one at a time
Memory Management - Content is processed in chunks
File System Operations - Minimal file I/O with caching
Plugin Efficiency - Plugins can be optimized for specific use cases

Extension Points

The application provides several extension points:

Custom Extractors - Implement ExtractorPlugin for new extraction methods
Custom Formatters - Implement TFormatter for content transformation
Custom Templates - Create custom markdown templates
Custom Generators - Extend generation capabilities

Best Practices

Use Builder Pattern - Leverage fluent configuration API
Plugin Composition - Combine multiple plugins for complex extraction
Template Design - Create reusable templates with clear placeholders
Error Handling - Implement proper error handling in custom plugins
Testing - Test plugins and configurations thoroughly

Common Use Cases

JSDoc Extraction - Extract JSDoc comments from JavaScript/TypeScript
Tag-based Documentation - Use custom tags for documentation
Multi-language Support - Different extractors for different languages
API Documentation - Generate API docs from source code
Code Examples - Extract and format code examples

This flow documentation should help new developers understand the application architecture and contribute effectively to the codebase.

Simple Docs Extractor - Application Flow

Overview

High-Level Architecture

Application Flow

1. Configuration Phase

2. File Discovery Phase

3. File Processing Phase

3.1 Pre-Processing (CodeFileProcessor.preProcess())

3.2 Content Extraction (DocumentContentExtractor)

3.3 Template Processing (ContentInjection)

3.4 Formatting (TFormatter)

4. File Generation Phase

4.1 Documentation File Generation (DocFileGenerator)

4.2 Index File Generation (MarkdownIndexProcessor)

5. Output Structure

Key Data Structures

ExtractedContent

ProcessResult

Plugin System

ExtractorPlugin Interface

Creating Custom Plugins

Error Handling

Configuration Options

Target Configuration

Template Configuration

Performance Considerations

Extension Points

Best Practices

Common Use Cases

3.1 Pre-Processing (`CodeFileProcessor.preProcess()`)

3.2 Content Extraction (`DocumentContentExtractor`)

3.3 Template Processing (`ContentInjection`)

3.4 Formatting (`TFormatter`)

4.1 Documentation File Generation (`DocFileGenerator`)

4.2 Index File Generation (`MarkdownIndexProcessor`)