Excel to Markdown Conversion Technique using @knowcode/convert-to-markdown
Created: 2025-07-28
Purpose: Document the methodology for converting Excel spreadsheets to AI-ready Markdown and JSON formats
Use Case: Converting complex financial models and data sheets for AI analysis
Overview
The @knowcode/convert-to-markdown package is specifically designed for converting documents to AI-ready formats. It excels at preserving complex Excel structures including formulas, charts, and pivot tables while creating clean, structured output suitable for language models.
Installation
Global Installation (Recommended for CLI usage)
npm install -g @knowcode/convert-to-markdown
Local Project Installation
npm install @knowcode/convert-to-markdown
Pre-Conversion Setup
Create Output Directory Structure
mkdir -p docs/spreadsheet-md
mkdir -p docs/spreadsheet-json
Testing Protocol (Always Start Here)
1. Select Representative Test File
Choose a file that represents the complexity and structure of your dataset:
- Medium complexity (multiple sheets, tables, formulas)
- Important reference data
- Known data structure for easy verification
2. Single File Test Commands
# Convert to Markdown
convert-to-markdown excel-to-markdown "path/to/test-file.xlsx" -o "docs/spreadsheet-md/test-output.md"
# Convert to JSON for comparison
convert-to-markdown excel-to-json "path/to/test-file.xlsx" -o "docs/spreadsheet-json/test-output.json" --pretty
3. Quality Verification Checklist
Original Excel File Analysis:
- Count number of sheets/tabs
- Note key tables and their dimensions
- Identify important formulas and calculations
- Check for charts, pivot tables, or complex formatting
- Document critical data points for verification
Converted Output Verification:
- All sheets are represented in output
- Table structures are preserved
- Data values match exactly (spot check 10-20 cells)
- No missing rows or columns
- Formula preservation (where supported)
- Special formatting noted or preserved
Format Comparison:
- Markdown: Human-readable, good for AI analysis
- JSON: Structured data, good for programmatic processing
- Which format better preserves your specific data needs?
Command Templates
Basic Conversion
# Excel to Markdown
convert-to-markdown excel-to-markdown "input-file.xlsx" -o "output-file.md"
# Excel to JSON
convert-to-markdown excel-to-json "input-file.xlsx" -o "output-file.json" --pretty
Advanced Options
# Filter specific sheets by prefix
convert-to-markdown excel-to-json "data.xlsx" --sheet-prefix "Financial" -o "financial-data.json"
# Batch processing (if supported)
convert-to-markdown batch "*.xlsx" --output-dir ./output/
Programmatic Usage
const ConvertToMarkdown = require('@knowcode/convert-to-markdown');
// Convert to JSON
const result = await ConvertToMarkdown.excelToJson('data.xlsx');
console.log(result.sheets);
// Access specific sheet data
result.sheets.forEach(sheet => {
console.log(`Sheet: ${sheet.name}`);
console.log(`Rows: ${sheet.data.length}`);
});
Quality Assurance Best Practices
Data Integrity Checks
- Row/Column Counts: Verify dimensions match original
- Data Types: Ensure numbers remain numbers, dates are preserved
- Special Characters: Check handling of currency symbols, percentages
- Empty Cells: Verify empty cells are handled correctly
- Formula Results: Check if calculated values are preserved
Critical Elements to Verify
- Headers and Labels: Table headers should be clearly identified
- Financial Data: Currency formatting and calculations
- Dates: Date formats should be preserved or standardized
- Percentages: Percentage values should maintain meaning
- Large Numbers: No truncation or scientific notation issues
Troubleshooting
Common Issues
File Size Limits: 50MB maximum file size
- Solution: Split large files or use sheet filtering
Character Limit: ~50,000 character output limit
- Solution: Process sheets individually or filter data
Complex Formatting Lost: Advanced Excel formatting not preserved
- Expected: Focus on data integrity rather than visual formatting
Formula Conversion: Formulas may convert to their calculated values
- Check: Verify if you need formulas or just their results
Error Resolution
# If conversion fails, try:
# 1. Check file permissions
ls -la "path/to/file.xlsx"
# 2. Verify file is not corrupted
file "path/to/file.xlsx"
# 3. Test with smaller subset
convert-to-markdown excel-to-json "file.xlsx" --sheet-prefix "Sheet1" -o "test.json"
Use Case Guidelines
When to Use Markdown Output
- AI analysis and processing
- Documentation integration
- Human-readable data review
- Version control friendly format
When to Use JSON Output
- Programmatic data processing
- API integrations
- Structured data analysis
- Database imports
Batch Processing Strategy
- Test First: Always test with single file
- Consistent Naming: Use systematic output naming
- Error Handling: Plan for conversion failures
- Verification: Spot-check batch results
Project Integration
Directory Structure
project/
βββ sheets/ # Original Excel files
βββ docs/
β βββ spreadsheet-md/ # Markdown conversions
β βββ spreadsheet-json/ # JSON conversions
βββ prompts/
βββ excel-conversion-technique.md # This guide
Version Control Considerations
- Include: Converted markdown/JSON files for tracking changes
- Exclude: Original Excel files if they contain sensitive data
- Document: Conversion timestamps and source file versions
Success Metrics
Conversion Quality
- Data Completeness: 100% of critical data preserved
- Structure Integrity: Table relationships maintained
- Accuracy: No data corruption or loss
- Usability: Output suitable for intended AI/analytical purpose
Process Efficiency
- Repeatability: Process can be documented and repeated
- Automation: Commands can be scripted for batch processing
- Quality Assurance: Verification steps are clearly defined
- Knowledge Transfer: Others can follow the process
Lessons Learned Template
After each conversion project, document:
- File Types: What Excel structures worked best/worst
- Data Patterns: Which data types converted most reliably
- Output Preferences: When to choose MD vs JSON
- Quality Issues: What required manual verification
- Process Improvements: How to streamline future conversions
This technique was developed for the WRU Catering Tender Analysis project (July 2025) and refined through practical application with complex financial models.