Simple Docs Extractor - Application Flow
This document provides a comprehensive overview of how the Simple Docs Extractor application works, designed for new developers who want to understand or contribute to the codebase.
Overview
Simple Docs Extractor is a documentation generation tool that extracts documentation from source code files and generates organized markdown documentation with index files. The application follows a modular architecture with clear separation of concerns.
High-Level Architecture
The application is structured around several key components:
- Configuration Layer - Builder pattern for fluent configuration
- File Processing Layer - File discovery and content extraction
- Content Processing Layer - Documentation extraction and transformation
- Generation Layer - Output file creation and formatting
- Plugin System - Extensible extraction and formatting capabilities
Application Flow
1. Configuration Phase
The application starts with configuration using the Builder pattern:
const scraper = SimpleDocExtractor.create('./src')
.target(target => {
target
.patterns('**/*.ts')
.cwd('./src')
.outDir('./docs')
.createIndexFiles()
.useDocumentationTemplate('./templates/doc.md');
})
.addRecommendedFormatters()
.buildService();
Key Classes:
Builder
- Main configuration builderTargetBuilder
- Configures individual processing targetsTemplateBuilder
- Configures templates for different output types
2. File Discovery Phase
Once configured, the application discovers files to process:
// In SimpleDocExtractor.handleTarget()
const files = await this.getFiles(target);
Key Classes:
FileScanner
- Uses glob patterns to find matching filesTarget
- Contains glob options and output configuration
Process:
FileScanner.collect()
scans directories using glob patterns- Returns array of absolute file paths matching the criteria
- Files are processed sequentially
3. File Processing Phase
For each discovered file, the application processes it through several stages:
3.1 Pre-Processing (CodeFileProcessor.preProcess()
)
const processedResult = await codeFileProcessor.preProcess(file, target);
Steps:
- Content Extraction - Uses
DocumentContentExtractor
with configured plugins - Template Injection - Injects extracted content into templates via
ContentInjection
- Formatting - Applies configured formatters to clean up content
- Output Path Generation - Determines where the output file should be saved
3.2 Content Extraction (DocumentContentExtractor
)
The extraction process uses a plugin-based system:
const extractedContentArray = await new DocumentContentExtractor(
target.plugins ?? []
).extractFromFile(file);
Available Plugins:
TagExtractorPlugin
- Extracts content from HTML/XML-like tags (<docs>content</docs>
)RegexExtractorPlugin
- Uses regex patterns to extract contentCallbackExtractorPlugin
- Custom callback-based extractionCopyContentsPlugin
- Copies entire file content
Process:
- Each plugin processes the file content
- Plugins return
ExtractedContent[]
with content and metadata - Results are combined and cleaned (trimmed, empty lines removed)
3.3 Template Processing (ContentInjection
)
injectedContent = contentInjection.mergeExtractedContentsIntoTemplateString(
extractedContentArray
);
Process:
- Reads template file content
- Replaces placeholder strings with extracted content
- Applies default text for missing content
- Uses
TemplateContentExtractionContentMerger
for complex merging
3.4 Formatting (TFormatter
)
for (const formatter of this.config.formatters) {
injectedContent = formatter({
filePath: file,
outFile: file,
content: injectedContent,
});
}
Available Formatters:
RemoveMultiLineCommentAsterisks
- Removes comment formattingAddDoubleLinesFormatter
- Adds spacing between content lines
4. File Generation Phase
4.1 Documentation File Generation (DocFileGenerator
)
new DocFileGenerator({
templatePath: this.getDocFileGeneratorConfig(target).templatePath,
outDir: transformedOutDir,
}).saveToMarkdownFile(injectedContent, file);
Process:
- Creates output directory structure
- Converts file extension to
.md
- Writes processed content to markdown file
4.2 Index File Generation (MarkdownIndexProcessor
)
After all files are processed, index files are generated:
await new MarkdownIndexProcessor({
...this.getIndexProcessorConfig(target),
recursive: true,
}).handle(target.outDir);
Process:
- Directory Scanning -
DirectoryMarkdownScanner
scans directories for markdown files - Content Generation -
IndexFileGenerator
creates index content with file listings - Template Application - Applies index templates with search/replace patterns
- Recursive Processing - Handles subdirectories if configured
Index Types:
- Regular Index - Lists files and folders in each directory
- Flattened Index - Shows all files in a flat structure
- Root Index - Special index for the root documentation directory
5. Output Structure
The application generates a structured documentation output:
docs/
├── index.md # Root index file
├── src/
│ ├── index.md # Directory index
│ ├── utils/
│ │ ├── index.md # Subdirectory index
│ │ ├── helper.ts.md # Generated documentation
│ │ └── validator.ts.md # Generated documentation
│ └── services/
│ ├── index.md
│ └── api.ts.md
Key Data Structures
ExtractedContent
type ExtractedContent = {
content: string; // The extracted documentation content
attributes: Record<string, string>; // Attributes from tags (e.g., class="example")
searchAndReplace: string; // Placeholder to replace in templates
divideBy?: string; // Optional delimiter for content sections
defaultText?: string; // Default text when no content found
};
ProcessResult
type ProcessResult = {
content: string; // Generated documentation content
outDir: string; // Output directory path
fileName: string; // Output filename
loggableFileName: string; // File path for logging
locales: Locales; // Template variables
} | {
error: string; // Error message
noDocumentationFound?: boolean; // Whether documentation was missing
};
Plugin System
The application uses a plugin-based architecture for extensibility:
ExtractorPlugin Interface
interface ExtractorPlugin<Config> {
setConfig(config: Config): ExtractorPlugin<Config>;
getConfig(): Config;
extractFromString(str: string): Promise<ExtractedContent[] | ErrorResult>;
}
Creating Custom Plugins
- Extractor Plugins - Extract content from source files
- Formatter Plugins - Transform extracted content
- Template Plugins - Process template content
Error Handling
The application handles errors gracefully:
- File Not Found - Logs error and continues with next file
- No Documentation Found - Tracks missing documentation count
- Plugin Errors - Supports throwable vs non-throwable errors
- Template Errors - Validates template files exist
Configuration Options
Target Configuration
- File Patterns - Glob patterns for file discovery
- Output Directory - Where to save generated documentation
- Templates - Custom templates for different output types
- Plugins - Extraction and formatting plugins
- Index Creation - Whether to create index files
Template Configuration
- Documentation Templates - Templates for individual files
- Index Templates - Templates for directory listings
- Root Index Templates - Templates for root documentation
Performance Considerations
- Sequential Processing - Files are processed one at a time
- Memory Management - Content is processed in chunks
- File System Operations - Minimal file I/O with caching
- Plugin Efficiency - Plugins can be optimized for specific use cases
Extension Points
The application provides several extension points:
- Custom Extractors - Implement
ExtractorPlugin
for new extraction methods - Custom Formatters - Implement
TFormatter
for content transformation - Custom Templates - Create custom markdown templates
- Custom Generators - Extend generation capabilities
Best Practices
- Use Builder Pattern - Leverage fluent configuration API
- Plugin Composition - Combine multiple plugins for complex extraction
- Template Design - Create reusable templates with clear placeholders
- Error Handling - Implement proper error handling in custom plugins
- Testing - Test plugins and configurations thoroughly
Common Use Cases
- JSDoc Extraction - Extract JSDoc comments from JavaScript/TypeScript
- Tag-based Documentation - Use custom tags for documentation
- Multi-language Support - Different extractors for different languages
- API Documentation - Generate API docs from source code
- Code Examples - Extract and format code examples
This flow documentation should help new developers understand the application architecture and contribute effectively to the codebase.