A Python tool for downloading and transcribing videos from multiple social media platforms. Uses yt-dlp for downloads and parakeet-mlx for transcription on Apple Silicon.
Features
- Multi-Platform Support: YouTube, TikTok, Facebook, Instagram, Reddit, Twitch, Vimeo, X/Twitter
- Entire Playlist/Channel Support: Transcribe entire YouTube playlists or channels with automatic folder organization
- Fast Transcription: Audio speed optimization (2x-3x faster with pitch preservation)
- Direct YouTube Transcripts: Extracts existing subtitles when available (skips re-transcription)
- LLM Enhancement: Optional AI-powered transcript formatting via OpenRouter
- CLI Interface: Simple
transcribercommand for all operations
Quick Start
# Install
pip install -e .
# Transcribe a single video
transcriber run "https://www.youtube.com/watch?v=VIDEO_ID"
# Transcribe an entire playlist (all videos, auto-organized)
transcriber run "https://www.youtube.com/playlist?list=PLAYLIST_ID"
# Transcribe a channel (all videos)
transcriber run "https://www.youtube.com/@channelname"
# With LLM enhancement
transcriber run "https://www.youtube.com/watch?v=VIDEO_ID" --enhance
# Process multiple URLs from a file
transcriber run --file urls.txt
# Combine transcript files from channels/playlists
transcriber transcripts -d output
# Combine .mdx files for LLM processing
transcriber combine /path/to/folder
Installation
Prerequisites
- Python 3.8+
- FFmpeg (for audio processing)
- UV recommended, or pip
Setup
# Clone and install
git clone https://github.com/thesethrose/social-media-transcriber.git
cd social-media-video-transcriber
# Using UV (recommended)
uv venv
source .venv/bin/activate
uv pip install -e .
# Or using pip
pip install -e .
Environment Configuration
Copy .env.example to .env and configure:
cp .env.example .env
Key settings:
OPENROUTER_API_KEY: For LLM transcript enhancement (optional)TRANSCRIPT_OUTPUT_DIR: Default output directory
Documentation
See the docs directory for detailed guides:
- Adding New Providers - Extending platform support
- Audio Speed Optimization - Performance tuning
- Playlist Folder Naming - Output organization
- Testing Guide - Running tests
Usage
Basic Transcription
transcriber run "https://www.youtube.com/watch?v=VIDEO_ID"
Options
| Option | Description |
|---|---|
-f, --file FILE | File containing URLs (one per line) |
-o, --output-dir DIR | Output directory for transcripts |
--speed SPEED | Audio speed multiplier (1.0=normal, default: 2.0) |
--enhance | Use LLM to format/enhance transcript |
-w, --max-workers N | Concurrent workers (default: 4) |
Combining Transcripts
# Combine all channel/playlist folders in output/
transcriber transcripts
# Combine specific channel or playlist
transcriber transcripts -c "Channel Name"
# Combine .mdx files for LLM processing
transcriber combine /path/to/folder
Architecture
social_media_transcriber/
├── cli.py # Click CLI commands
├── config/
│ └── settings.py # Configuration management
├── core/
│ ├── transcriber.py # Audio transcription logic
│ ├── downloader.py # Provider discovery/routing
│ └── providers/ # Platform-specific implementations
│ ├── base.py # Base provider class
│ └── youtube_provider.py # YouTube with transcript extraction
└── utils/
├── file_utils.py # File operations
└── processing.py # Core URL processing
Adding a new platform is as simple as adding a provider file - no other changes needed.
Contributing
- Fork the repository
- Create a feature branch
- Add your provider in
social_media_transcriber/core/providers/ - Run tests:
pytest - Submit a pull request
See Adding New Providers for details.
Dependencies
yt-dlp- Video downloadingparakeet-mlx- MLX-based transcription (Apple Silicon)click- CLI frameworkrich- Terminal outputpython-dotenv- Environment configuration
See requirements.txt for full list.
License
MIT License - see LICENSE file for details.
Acknowledgments
- yt-dlp for excellent video downloading
- parakeet-mlx for fast on-device transcription