imgfetch CLI Tool - Implementation Plan
Generated: 2025-07-26 18:46 UTC
Status: Complete
Verified:
Overview
This implementation plan provides a step-by-step guide to building the imgfetch command-line tool based on the technical analysis. The plan focuses on practical development steps, MVP features, and iterative enhancement.
Phase 1: Core Foundation (Week 1)
1.1 Project Setup
# Project structure
imgfetch/
βββ imgfetch/
β βββ __init__.py
β βββ cli.py # Click CLI interface
β βββ core/
β β βββ __init__.py
β β βββ downloader.py # Main download orchestrator
β β βββ strategies/ # Strategy implementations
β β β βββ __init__.py
β β β βββ base.py # Abstract base strategy
β β β βββ direct.py # Direct download
β β β βββ browser.py # Browser automation
β β βββ cache.py # Cache management
β β βββ config.py # Configuration handling
β β βββ utils.py # Utility functions
β βββ exceptions.py # Custom exceptions
βββ tests/
βββ setup.py
βββ requirements.txt
βββ README.md
βββ .github/workflows/ # CI/CD
1.2 Basic Dependencies
# requirements.txt
click>=8.1.0
httpx>=0.24.0
beautifulsoup4>=4.12.0
Pillow>=10.0.0
diskcache>=5.6.0
pydantic>=2.0.0
rich>=13.0.0 # For better CLI output
python-dotenv>=1.0.0
1.3 MVP CLI Interface
# imgfetch/cli.py
import click
from rich.console import Console
from .core.downloader import ImageDownloader
console = Console()
@click.command()
@click.argument('url')
@click.option('--output', '-o', help='Output file path')
@click.option('--timeout', default=30, help='Download timeout in seconds')
@click.option('--quiet', '-q', is_flag=True, help='Suppress output')
def main(url, output, timeout, quiet):
"""Download images from any URL."""
downloader = ImageDownloader(timeout=timeout, quiet=quiet)
try:
with console.status("[bold green]Downloading image...") as status:
result = downloader.download(url, output)
if not quiet:
console.print(f"[green]β[/green] Downloaded to: {result['file_path']}")
console.print(f" Size: {result['file_size']:,} bytes")
console.print(f" Time: {result['download_time']:.2f}s")
except Exception as e:
console.print(f"[red]β[/red] Download failed: {str(e)}")
raise click.Exit(1)
if __name__ == '__main__':
main()
Phase 2: Strategy Implementation (Week 2)
2.1 Base Strategy Interface
# imgfetch/core/strategies/base.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
class DownloadStrategy(ABC):
"""Abstract base class for download strategies."""
def __init__(self, config: Dict[str, Any]):
self.config = config
self.timeout = config.get('timeout', 30)
@abstractmethod
async def can_handle(self, url: str) -> bool:
"""Check if this strategy can handle the URL."""
pass
@abstractmethod
async def download(self, url: str, headers: Optional[Dict] = None) -> bytes:
"""Download the image and return raw bytes."""
pass
@property
@abstractmethod
def name(self) -> str:
"""Strategy name for logging."""
pass
2.2 Direct Download Strategy
# imgfetch/core/strategies/direct.py
import httpx
from typing import Dict, Optional
from .base import DownloadStrategy
class DirectDownloadStrategy(DownloadStrategy):
"""Direct HTTP/HTTPS download strategy."""
name = "direct"
async def can_handle(self, url: str) -> bool:
# Check if URL points directly to an image
image_extensions = ('.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp')
return any(url.lower().endswith(ext) for ext in image_extensions)
async def download(self, url: str, headers: Optional[Dict] = None) -> bytes:
headers = headers or {}
headers.setdefault('User-Agent', 'imgfetch/1.0')
async with httpx.AsyncClient(timeout=self.timeout) as client:
response = await client.get(url, headers=headers, follow_redirects=True)
response.raise_for_status()
# Verify content type
content_type = response.headers.get('content-type', '')
if not content_type.startswith('image/'):
raise ValueError(f"Response is not an image: {content_type}")
return response.content
2.3 Strategy Orchestrator
# imgfetch/core/downloader.py
import asyncio
from pathlib import Path
from typing import List, Optional, Dict, Any
from .strategies import DirectDownloadStrategy, BrowserStrategy
from .cache import CacheManager
class ImageDownloader:
"""Main orchestrator for image downloads."""
def __init__(self, timeout: int = 30, quiet: bool = False):
self.timeout = timeout
self.quiet = quiet
self.cache = CacheManager()
# Initialize strategies in priority order
config = {'timeout': timeout}
self.strategies = [
DirectDownloadStrategy(config),
# BrowserStrategy(config), # Add in Phase 3
]
def download(self, url: str, output: Optional[str] = None) -> Dict[str, Any]:
"""Synchronous wrapper for async download."""
return asyncio.run(self._download_async(url, output))
async def _download_async(self, url: str, output: Optional[str] = None) -> Dict[str, Any]:
"""Async download with strategy selection."""
# Check cache first
cached = await self.cache.get(url)
if cached:
return self._save_image(cached['data'], output or cached['filename'])
# Try each strategy
for strategy in self.strategies:
if await strategy.can_handle(url):
try:
data = await strategy.download(url)
result = self._save_image(data, output or self._generate_filename(url))
# Cache successful download
await self.cache.set(url, {
'data': data,
'filename': result['file_path'].name
})
return result
except Exception as e:
# Log error and try next strategy
continue
raise ValueError(f"No strategy could handle URL: {url}")
Phase 3: Browser Automation (Week 3)
3.1 Additional Dependencies
# Add to requirements.txt
playwright>=1.40.0
playwright-stealth>=1.0.6
3.2 Browser Strategy Implementation
# imgfetch/core/strategies/browser.py
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
import asyncio
from .base import DownloadStrategy
class BrowserStrategy(DownloadStrategy):
"""Browser automation strategy for complex sites."""
name = "browser"
async def can_handle(self, url: str) -> bool:
# Use for non-direct image URLs
return not url.lower().endswith(('.jpg', '.jpeg', '.png', '.gif'))
async def download(self, url: str, headers: Optional[Dict] = None) -> bytes:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
)
# Apply stealth techniques
await stealth_async(context)
page = await context.new_page()
# Navigate to page
await page.goto(url, wait_until='networkidle')
await page.wait_for_timeout(2000) # Wait for dynamic content
# Find largest image on page
image_url = await self._find_best_image(page)
if not image_url:
raise ValueError("No suitable image found on page")
# Download through browser context
response = await page.request.get(image_url)
await browser.close()
return response.body()
async def _find_best_image(self, page):
"""Find the best image on the page."""
images = await page.evaluate('''
() => {
const imgs = Array.from(document.querySelectorAll('img'));
return imgs
.map(img => ({
src: img.src,
width: img.naturalWidth,
height: img.naturalHeight,
area: img.naturalWidth * img.naturalHeight
}))
.filter(img => img.src && img.area > 10000)
.sort((a, b) => b.area - a.area);
}
''')
return images[0]['src'] if images else None
3.3 Installation Script
# scripts/install_playwright.py
import subprocess
import sys
def install_playwright():
"""Install Playwright browsers."""
try:
subprocess.run([sys.executable, "-m", "playwright", "install", "chromium"], check=True)
print("β Playwright browsers installed successfully")
except subprocess.CalledProcessError as e:
print(f"β Failed to install Playwright browsers: {e}")
sys.exit(1)
if __name__ == "__main__":
install_playwright()
Phase 4: Advanced Features (Week 4)
4.1 Authentication Support
# imgfetch/core/auth.py
import tomli
from pathlib import Path
from typing import Dict, Any
class AuthManager:
"""Manage authentication configurations."""
def __init__(self, config_path: Path = None):
self.config_path = config_path or Path.home() / '.imgfetch' / 'auth.toml'
self.config = self._load_config()
def _load_config(self) -> Dict[str, Any]:
if not self.config_path.exists():
return {}
with open(self.config_path, 'rb') as f:
return tomli.load(f)
def get_auth_for_url(self, url: str) -> Dict[str, Any]:
"""Get authentication config for a specific URL."""
from urllib.parse import urlparse
domain = urlparse(url).netloc
return self.config.get('sites', {}).get(domain, {})
4.2 Rate Limiting
# imgfetch/core/ratelimit.py
import time
from collections import defaultdict
from typing import Dict
class RateLimiter:
"""Token bucket rate limiter."""
def __init__(self):
self.buckets: Dict[str, Dict] = defaultdict(lambda: {
'tokens': 1.0,
'last_update': time.time(),
'rate': 1.0 # tokens per second
})
async def acquire(self, domain: str, tokens: float = 1.0):
"""Acquire tokens, waiting if necessary."""
bucket = self.buckets[domain]
now = time.time()
# Refill tokens
elapsed = now - bucket['last_update']
bucket['tokens'] = min(
bucket['rate'], # Max tokens = rate
bucket['tokens'] + elapsed * bucket['rate']
)
bucket['last_update'] = now
# Wait if not enough tokens
if bucket['tokens'] < tokens:
wait_time = (tokens - bucket['tokens']) / bucket['rate']
await asyncio.sleep(wait_time)
bucket['tokens'] = 0
else:
bucket['tokens'] -= tokens
4.3 JSON Output Support
# imgfetch/cli.py (updated)
@click.option('--json', is_flag=True, help='Output results as JSON')
def main(url, output, timeout, quiet, json):
"""Download images from any URL."""
downloader = ImageDownloader(timeout=timeout, quiet=quiet or json)
try:
result = downloader.download(url, output)
if json:
import json as json_lib
click.echo(json_lib.dumps(result, indent=2))
elif not quiet:
# ... existing output code ...
Phase 5: Testing and Quality (Week 5)
5.1 Test Structure
# tests/test_strategies.py
import pytest
import httpx
from imgfetch.core.strategies import DirectDownloadStrategy
@pytest.mark.asyncio
async def test_direct_download_success(httpx_mock):
# Mock successful image download
httpx_mock.add_response(
url="https://example.com/image.jpg",
content=b"fake_image_data",
headers={"content-type": "image/jpeg"}
)
strategy = DirectDownloadStrategy({'timeout': 30})
assert await strategy.can_handle("https://example.com/image.jpg")
data = await strategy.download("https://example.com/image.jpg")
assert data == b"fake_image_data"
@pytest.mark.asyncio
async def test_direct_download_not_image(httpx_mock):
# Mock HTML response instead of image
httpx_mock.add_response(
url="https://example.com/page.html",
content=b"<html>...</html>",
headers={"content-type": "text/html"}
)
strategy = DirectDownloadStrategy({'timeout': 30})
with pytest.raises(ValueError, match="not an image"):
await strategy.download("https://example.com/page.html")
5.2 Integration Tests
# tests/test_integration.py
import pytest
from click.testing import CliRunner
from imgfetch.cli import main
def test_cli_direct_download():
runner = CliRunner()
with runner.isolated_filesystem():
result = runner.invoke(main, [
'https://via.placeholder.com/150',
'--output', 'test.png'
])
assert result.exit_code == 0
assert Path('test.png').exists()
5.3 CI/CD Pipeline
# .github/workflows/test.yml
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8, 3.9, '3.10', 3.11]
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-dev.txt
playwright install chromium
- name: Run tests
run: |
pytest --cov=imgfetch --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
Phase 6: Distribution (Week 6)
6.1 Package Configuration
# setup.py
from setuptools import setup, find_packages
setup(
name="imgfetch",
version="0.1.0",
packages=find_packages(),
install_requires=[
"click>=8.1.0",
"httpx>=0.24.0",
"beautifulsoup4>=4.12.0",
"Pillow>=10.0.0",
"playwright>=1.40.0",
# ... other dependencies
],
entry_points={
"console_scripts": [
"imgfetch=imgfetch.cli:main",
],
},
python_requires=">=3.8",
author="Your Name",
author_email="your.email@example.com",
description="Reliable image retrieval CLI for Claude Code",
long_description=open("README.md").read(),
long_description_content_type="text/markdown",
url="https://github.com/yourusername/imgfetch",
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
],
)
6.2 Docker Support
# Dockerfile
FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
wget \
gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list \
&& apt-get update && apt-get install -y \
google-chrome-stable \
&& rm -rf /var/lib/apt/lists/*
# Install Python package
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -e .
RUN playwright install chromium
# Create non-root user
RUN useradd -m -u 1000 imgfetch
USER imgfetch
ENTRYPOINT ["imgfetch"]
6.3 Documentation
# README.md
# imgfetch
A reliable command-line tool for retrieving images from any source, designed for use with Claude Code and other automation tools.
## Features
- π Multiple download strategies (direct, browser automation, API)
- π Automatic fallback mechanisms
- π Authentication support
- β‘ Intelligent caching
- π‘οΈ Rate limiting and ethical safeguards
- π JSON output for programmatic use
## Installation
```bash
pip install imgfetch
For browser automation support:
playwright install chromium
Quick Start
# Simple download
imgfetch https://example.com/image.jpg -o local.jpg
# From a webpage
imgfetch https://example.com/gallery/photo1 --strategy browser
# With authentication
imgfetch https://protected.site/image.png --auth-config ~/.imgfetch/auth.toml
# JSON output for scripts
imgfetch https://example.com/img.jpg --json | jq '.file_path'
Configuration
Create ~/.imgfetch/config.toml:
[general]
cache_ttl = 3600
timeout = 30
[rate_limits]
default = 1.0
"api.example.com" = 10.0
License
MIT
## Implementation Timeline Summary
| Week | Phase | Key Deliverables |
|------|-------|------------------|
| 1 | Core Foundation | Basic CLI, direct download strategy |
| 2 | Strategy Implementation | Strategy pattern, orchestrator |
| 3 | Browser Automation | Playwright integration, stealth mode |
| 4 | Advanced Features | Auth, rate limiting, JSON output |
| 5 | Testing & Quality | Unit tests, integration tests, CI/CD |
| 6 | Distribution | PyPI package, Docker image, documentation |
## Success Metrics
- β
90%+ success rate on common image sources
- β
<5s average download time for direct URLs
- β
Handles JavaScript-rendered content
- β
Respects rate limits and robots.txt
- β
Clear error messages for Claude Code
- β
Comprehensive test coverage (>80%)
This implementation plan provides a practical roadmap for building the imgfetch tool, with clear phases, code examples, and deliverables for each week of development.