Last updated: Aug 1, 2025, 02:00 PM UTC

Excel to Markdown Conversion Technique using @knowcode/convert-to-markdown

Created: 2025-07-28
Purpose: Document the methodology for converting Excel spreadsheets to AI-ready Markdown and JSON formats
Use Case: Converting complex financial models and data sheets for AI analysis

Overview

The @knowcode/convert-to-markdown package is specifically designed for converting documents to AI-ready formats. It excels at preserving complex Excel structures including formulas, charts, and pivot tables while creating clean, structured output suitable for language models.

Installation

Global Installation (Recommended for CLI usage)

npm install -g @knowcode/convert-to-markdown

Local Project Installation

npm install @knowcode/convert-to-markdown

Pre-Conversion Setup

Create Output Directory Structure

mkdir -p docs/spreadsheet-md
mkdir -p docs/spreadsheet-json

Testing Protocol (Always Start Here)

1. Select Representative Test File

Choose a file that represents the complexity and structure of your dataset:

  • Medium complexity (multiple sheets, tables, formulas)
  • Important reference data
  • Known data structure for easy verification

2. Single File Test Commands

# Convert to Markdown
convert-to-markdown excel-to-markdown "path/to/test-file.xlsx" -o "docs/spreadsheet-md/test-output.md"

# Convert to JSON for comparison
convert-to-markdown excel-to-json "path/to/test-file.xlsx" -o "docs/spreadsheet-json/test-output.json" --pretty

3. Quality Verification Checklist

Original Excel File Analysis:

  • Count number of sheets/tabs
  • Note key tables and their dimensions
  • Identify important formulas and calculations
  • Check for charts, pivot tables, or complex formatting
  • Document critical data points for verification

Converted Output Verification:

  • All sheets are represented in output
  • Table structures are preserved
  • Data values match exactly (spot check 10-20 cells)
  • No missing rows or columns
  • Formula preservation (where supported)
  • Special formatting noted or preserved

Format Comparison:

  • Markdown: Human-readable, good for AI analysis
  • JSON: Structured data, good for programmatic processing
  • Which format better preserves your specific data needs?

Command Templates

Basic Conversion

# Excel to Markdown
convert-to-markdown excel-to-markdown "input-file.xlsx" -o "output-file.md"

# Excel to JSON
convert-to-markdown excel-to-json "input-file.xlsx" -o "output-file.json" --pretty

Advanced Options

# Filter specific sheets by prefix
convert-to-markdown excel-to-json "data.xlsx" --sheet-prefix "Financial" -o "financial-data.json"

# Batch processing (if supported)
convert-to-markdown batch "*.xlsx" --output-dir ./output/

Programmatic Usage

const ConvertToMarkdown = require('@knowcode/convert-to-markdown');

// Convert to JSON
const result = await ConvertToMarkdown.excelToJson('data.xlsx');
console.log(result.sheets);

// Access specific sheet data
result.sheets.forEach(sheet => {
    console.log(`Sheet: ${sheet.name}`);
    console.log(`Rows: ${sheet.data.length}`);
});

Quality Assurance Best Practices

Data Integrity Checks

  1. Row/Column Counts: Verify dimensions match original
  2. Data Types: Ensure numbers remain numbers, dates are preserved
  3. Special Characters: Check handling of currency symbols, percentages
  4. Empty Cells: Verify empty cells are handled correctly
  5. Formula Results: Check if calculated values are preserved

Critical Elements to Verify

  • Headers and Labels: Table headers should be clearly identified
  • Financial Data: Currency formatting and calculations
  • Dates: Date formats should be preserved or standardized
  • Percentages: Percentage values should maintain meaning
  • Large Numbers: No truncation or scientific notation issues

Troubleshooting

Common Issues

File Size Limits: 50MB maximum file size

  • Solution: Split large files or use sheet filtering

Character Limit: ~50,000 character output limit

  • Solution: Process sheets individually or filter data

Complex Formatting Lost: Advanced Excel formatting not preserved

  • Expected: Focus on data integrity rather than visual formatting

Formula Conversion: Formulas may convert to their calculated values

  • Check: Verify if you need formulas or just their results

Error Resolution

# If conversion fails, try:
# 1. Check file permissions
ls -la "path/to/file.xlsx"

# 2. Verify file is not corrupted
file "path/to/file.xlsx"

# 3. Test with smaller subset
convert-to-markdown excel-to-json "file.xlsx" --sheet-prefix "Sheet1" -o "test.json"

Use Case Guidelines

When to Use Markdown Output

  • AI analysis and processing
  • Documentation integration
  • Human-readable data review
  • Version control friendly format

When to Use JSON Output

  • Programmatic data processing
  • API integrations
  • Structured data analysis
  • Database imports

Batch Processing Strategy

  1. Test First: Always test with single file
  2. Consistent Naming: Use systematic output naming
  3. Error Handling: Plan for conversion failures
  4. Verification: Spot-check batch results

Project Integration

Directory Structure

project/
β”œβ”€β”€ sheets/                     # Original Excel files
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ spreadsheet-md/        # Markdown conversions
β”‚   └── spreadsheet-json/      # JSON conversions
└── prompts/
    └── excel-conversion-technique.md  # This guide

Version Control Considerations

  • Include: Converted markdown/JSON files for tracking changes
  • Exclude: Original Excel files if they contain sensitive data
  • Document: Conversion timestamps and source file versions

Success Metrics

Conversion Quality

  • Data Completeness: 100% of critical data preserved
  • Structure Integrity: Table relationships maintained
  • Accuracy: No data corruption or loss
  • Usability: Output suitable for intended AI/analytical purpose

Process Efficiency

  • Repeatability: Process can be documented and repeated
  • Automation: Commands can be scripted for batch processing
  • Quality Assurance: Verification steps are clearly defined
  • Knowledge Transfer: Others can follow the process

Lessons Learned Template

After each conversion project, document:

  • File Types: What Excel structures worked best/worst
  • Data Patterns: Which data types converted most reliably
  • Output Preferences: When to choose MD vs JSON
  • Quality Issues: What required manual verification
  • Process Improvements: How to streamline future conversions

This technique was developed for the WRU Catering Tender Analysis project (July 2025) and refined through practical application with complex financial models.