imgfetch CLI Tool - Implementation Plan

Generated: 2025-07-26 18:46 UTC
Status: Complete
Verified:

Overview

This implementation plan provides a step-by-step guide to building the imgfetch command-line tool based on the technical analysis. The plan focuses on practical development steps, MVP features, and iterative enhancement.

Phase 1: Core Foundation (Week 1)

1.1 Project Setup

# Project structure
imgfetch/
├── imgfetch/
│   ├── __init__.py
│   ├── cli.py              # Click CLI interface
│   ├── core/
│   │   ├── __init__.py
│   │   ├── downloader.py   # Main download orchestrator
│   │   ├── strategies/     # Strategy implementations
│   │   │   ├── __init__.py
│   │   │   ├── base.py     # Abstract base strategy
│   │   │   ├── direct.py   # Direct download
│   │   │   └── browser.py  # Browser automation
│   │   ├── cache.py        # Cache management
│   │   ├── config.py       # Configuration handling
│   │   └── utils.py        # Utility functions
│   └── exceptions.py       # Custom exceptions
├── tests/
├── setup.py
├── requirements.txt
├── README.md
└── .github/workflows/      # CI/CD

1.2 Basic Dependencies

# requirements.txt
click>=8.1.0
httpx>=0.24.0
beautifulsoup4>=4.12.0
Pillow>=10.0.0
diskcache>=5.6.0
pydantic>=2.0.0
rich>=13.0.0  # For better CLI output
python-dotenv>=1.0.0

1.3 MVP CLI Interface

# imgfetch/cli.py
import click
from rich.console import Console
from .core.downloader import ImageDownloader

console = Console()

@click.command()
@click.argument('url')
@click.option('--output', '-o', help='Output file path')
@click.option('--timeout', default=30, help='Download timeout in seconds')
@click.option('--quiet', '-q', is_flag=True, help='Suppress output')
def main(url, output, timeout, quiet):
    """Download images from any URL."""
    downloader = ImageDownloader(timeout=timeout, quiet=quiet)
    
    try:
        with console.status("[bold green]Downloading image...") as status:
            result = downloader.download(url, output)
            
        if not quiet:
            console.print(f"[green]✓[/green] Downloaded to: {result['file_path']}")
            console.print(f"  Size: {result['file_size']:,} bytes")
            console.print(f"  Time: {result['download_time']:.2f}s")
    
    except Exception as e:
        console.print(f"[red]✗[/red] Download failed: {str(e)}")
        raise click.Exit(1)

if __name__ == '__main__':
    main()

Phase 2: Strategy Implementation (Week 2)

2.1 Base Strategy Interface

# imgfetch/core/strategies/base.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional

class DownloadStrategy(ABC):
    """Abstract base class for download strategies."""
    
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.timeout = config.get('timeout', 30)
    
    @abstractmethod
    async def can_handle(self, url: str) -> bool:
        """Check if this strategy can handle the URL."""
        pass
    
    @abstractmethod
    async def download(self, url: str, headers: Optional[Dict] = None) -> bytes:
        """Download the image and return raw bytes."""
        pass
    
    @property
    @abstractmethod
    def name(self) -> str:
        """Strategy name for logging."""
        pass

2.2 Direct Download Strategy

# imgfetch/core/strategies/direct.py
import httpx
from typing import Dict, Optional
from .base import DownloadStrategy

class DirectDownloadStrategy(DownloadStrategy):
    """Direct HTTP/HTTPS download strategy."""
    
    name = "direct"
    
    async def can_handle(self, url: str) -> bool:
        # Check if URL points directly to an image
        image_extensions = ('.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp')
        return any(url.lower().endswith(ext) for ext in image_extensions)
    
    async def download(self, url: str, headers: Optional[Dict] = None) -> bytes:
        headers = headers or {}
        headers.setdefault('User-Agent', 'imgfetch/1.0')
        
        async with httpx.AsyncClient(timeout=self.timeout) as client:
            response = await client.get(url, headers=headers, follow_redirects=True)
            response.raise_for_status()
            
            # Verify content type
            content_type = response.headers.get('content-type', '')
            if not content_type.startswith('image/'):
                raise ValueError(f"Response is not an image: {content_type}")
            
            return response.content

2.3 Strategy Orchestrator

# imgfetch/core/downloader.py
import asyncio
from pathlib import Path
from typing import List, Optional, Dict, Any
from .strategies import DirectDownloadStrategy, BrowserStrategy
from .cache import CacheManager

class ImageDownloader:
    """Main orchestrator for image downloads."""
    
    def __init__(self, timeout: int = 30, quiet: bool = False):
        self.timeout = timeout
        self.quiet = quiet
        self.cache = CacheManager()
        
        # Initialize strategies in priority order
        config = {'timeout': timeout}
        self.strategies = [
            DirectDownloadStrategy(config),
            # BrowserStrategy(config),  # Add in Phase 3
        ]
    
    def download(self, url: str, output: Optional[str] = None) -> Dict[str, Any]:
        """Synchronous wrapper for async download."""
        return asyncio.run(self._download_async(url, output))
    
    async def _download_async(self, url: str, output: Optional[str] = None) -> Dict[str, Any]:
        """Async download with strategy selection."""
        # Check cache first
        cached = await self.cache.get(url)
        if cached:
            return self._save_image(cached['data'], output or cached['filename'])
        
        # Try each strategy
        for strategy in self.strategies:
            if await strategy.can_handle(url):
                try:
                    data = await strategy.download(url)
                    result = self._save_image(data, output or self._generate_filename(url))
                    
                    # Cache successful download
                    await self.cache.set(url, {
                        'data': data,
                        'filename': result['file_path'].name
                    })
                    
                    return result
                except Exception as e:
                    # Log error and try next strategy
                    continue
        
        raise ValueError(f"No strategy could handle URL: {url}")

Phase 3: Browser Automation (Week 3)

3.1 Additional Dependencies

# Add to requirements.txt
playwright>=1.40.0
playwright-stealth>=1.0.6

3.2 Browser Strategy Implementation

# imgfetch/core/strategies/browser.py
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
import asyncio
from .base import DownloadStrategy

class BrowserStrategy(DownloadStrategy):
    """Browser automation strategy for complex sites."""
    
    name = "browser"
    
    async def can_handle(self, url: str) -> bool:
        # Use for non-direct image URLs
        return not url.lower().endswith(('.jpg', '.jpeg', '.png', '.gif'))
    
    async def download(self, url: str, headers: Optional[Dict] = None) -> bytes:
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True)
            context = await browser.new_context(
                viewport={'width': 1920, 'height': 1080},
                user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
            )
            
            # Apply stealth techniques
            await stealth_async(context)
            
            page = await context.new_page()
            
            # Navigate to page
            await page.goto(url, wait_until='networkidle')
            await page.wait_for_timeout(2000)  # Wait for dynamic content
            
            # Find largest image on page
            image_url = await self._find_best_image(page)
            if not image_url:
                raise ValueError("No suitable image found on page")
            
            # Download through browser context
            response = await page.request.get(image_url)
            
            await browser.close()
            return response.body()
    
    async def _find_best_image(self, page):
        """Find the best image on the page."""
        images = await page.evaluate('''
            () => {
                const imgs = Array.from(document.querySelectorAll('img'));
                return imgs
                    .map(img => ({
                        src: img.src,
                        width: img.naturalWidth,
                        height: img.naturalHeight,
                        area: img.naturalWidth * img.naturalHeight
                    }))
                    .filter(img => img.src && img.area > 10000)
                    .sort((a, b) => b.area - a.area);
            }
        ''')
        
        return images[0]['src'] if images else None

3.3 Installation Script

# scripts/install_playwright.py
import subprocess
import sys

def install_playwright():
    """Install Playwright browsers."""
    try:
        subprocess.run([sys.executable, "-m", "playwright", "install", "chromium"], check=True)
        print("✓ Playwright browsers installed successfully")
    except subprocess.CalledProcessError as e:
        print(f"✗ Failed to install Playwright browsers: {e}")
        sys.exit(1)

if __name__ == "__main__":
    install_playwright()

Phase 4: Advanced Features (Week 4)

4.1 Authentication Support

# imgfetch/core/auth.py
import tomli
from pathlib import Path
from typing import Dict, Any

class AuthManager:
    """Manage authentication configurations."""
    
    def __init__(self, config_path: Path = None):
        self.config_path = config_path or Path.home() / '.imgfetch' / 'auth.toml'
        self.config = self._load_config()
    
    def _load_config(self) -> Dict[str, Any]:
        if not self.config_path.exists():
            return {}
        
        with open(self.config_path, 'rb') as f:
            return tomli.load(f)
    
    def get_auth_for_url(self, url: str) -> Dict[str, Any]:
        """Get authentication config for a specific URL."""
        from urllib.parse import urlparse
        domain = urlparse(url).netloc
        
        return self.config.get('sites', {}).get(domain, {})

4.2 Rate Limiting

# imgfetch/core/ratelimit.py
import time
from collections import defaultdict
from typing import Dict

class RateLimiter:
    """Token bucket rate limiter."""
    
    def __init__(self):
        self.buckets: Dict[str, Dict] = defaultdict(lambda: {
            'tokens': 1.0,
            'last_update': time.time(),
            'rate': 1.0  # tokens per second
        })
    
    async def acquire(self, domain: str, tokens: float = 1.0):
        """Acquire tokens, waiting if necessary."""
        bucket = self.buckets[domain]
        now = time.time()
        
        # Refill tokens
        elapsed = now - bucket['last_update']
        bucket['tokens'] = min(
            bucket['rate'],  # Max tokens = rate
            bucket['tokens'] + elapsed * bucket['rate']
        )
        bucket['last_update'] = now
        
        # Wait if not enough tokens
        if bucket['tokens'] < tokens:
            wait_time = (tokens - bucket['tokens']) / bucket['rate']
            await asyncio.sleep(wait_time)
            bucket['tokens'] = 0
        else:
            bucket['tokens'] -= tokens

4.3 JSON Output Support

# imgfetch/cli.py (updated)
@click.option('--json', is_flag=True, help='Output results as JSON')
def main(url, output, timeout, quiet, json):
    """Download images from any URL."""
    downloader = ImageDownloader(timeout=timeout, quiet=quiet or json)
    
    try:
        result = downloader.download(url, output)
        
        if json:
            import json as json_lib
            click.echo(json_lib.dumps(result, indent=2))
        elif not quiet:
            # ... existing output code ...

Phase 5: Testing and Quality (Week 5)

5.1 Test Structure

# tests/test_strategies.py
import pytest
import httpx
from imgfetch.core.strategies import DirectDownloadStrategy

@pytest.mark.asyncio
async def test_direct_download_success(httpx_mock):
    # Mock successful image download
    httpx_mock.add_response(
        url="https://example.com/image.jpg",
        content=b"fake_image_data",
        headers={"content-type": "image/jpeg"}
    )
    
    strategy = DirectDownloadStrategy({'timeout': 30})
    assert await strategy.can_handle("https://example.com/image.jpg")
    
    data = await strategy.download("https://example.com/image.jpg")
    assert data == b"fake_image_data"

@pytest.mark.asyncio
async def test_direct_download_not_image(httpx_mock):
    # Mock HTML response instead of image
    httpx_mock.add_response(
        url="https://example.com/page.html",
        content=b"<html>...</html>",
        headers={"content-type": "text/html"}
    )
    
    strategy = DirectDownloadStrategy({'timeout': 30})
    
    with pytest.raises(ValueError, match="not an image"):
        await strategy.download("https://example.com/page.html")

5.2 Integration Tests

# tests/test_integration.py
import pytest
from click.testing import CliRunner
from imgfetch.cli import main

def test_cli_direct_download():
    runner = CliRunner()
    with runner.isolated_filesystem():
        result = runner.invoke(main, [
            'https://via.placeholder.com/150',
            '--output', 'test.png'
        ])
        
        assert result.exit_code == 0
        assert Path('test.png').exists()

5.3 CI/CD Pipeline

# .github/workflows/test.yml
name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.8, 3.9, '3.10', 3.11]
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install -r requirements-dev.txt
        playwright install chromium
    
    - name: Run tests
      run: |
        pytest --cov=imgfetch --cov-report=xml
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3

Phase 6: Distribution (Week 6)

6.1 Package Configuration

# setup.py
from setuptools import setup, find_packages

setup(
    name="imgfetch",
    version="0.1.0",
    packages=find_packages(),
    install_requires=[
        "click>=8.1.0",
        "httpx>=0.24.0",
        "beautifulsoup4>=4.12.0",
        "Pillow>=10.0.0",
        "playwright>=1.40.0",
        # ... other dependencies
    ],
    entry_points={
        "console_scripts": [
            "imgfetch=imgfetch.cli:main",
        ],
    },
    python_requires=">=3.8",
    author="Your Name",
    author_email="your.email@example.com",
    description="Reliable image retrieval CLI for Claude Code",
    long_description=open("README.md").read(),
    long_description_content_type="text/markdown",
    url="https://github.com/yourusername/imgfetch",
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Programming Language :: Python :: 3.11",
    ],
)

6.2 Docker Support

# Dockerfile
FROM python:3.9-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list \
    && apt-get update && apt-get install -y \
    google-chrome-stable \
    && rm -rf /var/lib/apt/lists/*

# Install Python package
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -e .
RUN playwright install chromium

# Create non-root user
RUN useradd -m -u 1000 imgfetch
USER imgfetch

ENTRYPOINT ["imgfetch"]

6.3 Documentation

# README.md

# imgfetch

A reliable command-line tool for retrieving images from any source, designed for use with Claude Code and other automation tools.

## Features

- 🚀 Multiple download strategies (direct, browser automation, API)
- 🔄 Automatic fallback mechanisms
- 🔐 Authentication support
- ⚡ Intelligent caching
- 🛡️ Rate limiting and ethical safeguards
- 📊 JSON output for programmatic use

## Installation

```bash
pip install imgfetch

For browser automation support:

playwright install chromium

Quick Start

# Simple download
imgfetch https://example.com/image.jpg -o local.jpg

# From a webpage
imgfetch https://example.com/gallery/photo1 --strategy browser

# With authentication
imgfetch https://protected.site/image.png --auth-config ~/.imgfetch/auth.toml

# JSON output for scripts
imgfetch https://example.com/img.jpg --json | jq '.file_path'

Configuration

Create ~/.imgfetch/config.toml:

[general]
cache_ttl = 3600
timeout = 30

[rate_limits]
default = 1.0
"api.example.com" = 10.0

License

MIT


## Implementation Timeline Summary

| Week | Phase | Key Deliverables |
|------|-------|------------------|
| 1 | Core Foundation | Basic CLI, direct download strategy |
| 2 | Strategy Implementation | Strategy pattern, orchestrator |
| 3 | Browser Automation | Playwright integration, stealth mode |
| 4 | Advanced Features | Auth, rate limiting, JSON output |
| 5 | Testing & Quality | Unit tests, integration tests, CI/CD |
| 6 | Distribution | PyPI package, Docker image, documentation |

## Success Metrics

- ✅ 90%+ success rate on common image sources
- ✅ <5s average download time for direct URLs
- ✅ Handles JavaScript-rendered content
- ✅ Respects rate limits and robots.txt
- ✅ Clear error messages for Claude Code
- ✅ Comprehensive test coverage (>80%)

This implementation plan provides a practical roadmap for building the imgfetch tool, with clear phases, code examples, and deliverables for each week of development.