Troubleshooting Guide¶

GDPR Pseudonymizer - Error reference and solutions

Installation Issues¶

`poetry: command not found`¶

Cause: Poetry is not in your system PATH.

Solution: 1. Check installation location: - Windows: %APPDATA%\Python\Scripts - macOS/Linux: ~/.local/bin 2. Add to PATH: - Windows (PowerShell): $env:PATH += ";$env:APPDATA\Python\Scripts" - macOS/Linux: export PATH="$HOME/.local/bin:$PATH" (add to ~/.bashrc or ~/.zshrc) 3. Restart your terminal 4. Alternative: use python -m poetry instead of poetry

Python version not supported¶

Error: The currently activated Python version X.Y.Z is not supported

Solution: 1. Install Python 3.10, 3.11, or 3.12 2. Configure Poetry to use correct version:

poetry env use python3.11
poetry install

3. Verify with poetry env info (look for "Python: 3.10.x" or "3.11.x" or "3.12.x")

Note: Python 3.9 is no longer supported (EOL October 2025). Python 3.13+ is not yet tested.

spaCy model download fails¶

Possible causes: Network issues, insufficient disk space (~1GB needed), firewall blocking.

Solutions:

Check disk space:

# macOS/Linux
df -h
# Windows PowerShell
Get-PSDrive C

Manual installation:

poetry run python -m spacy download fr_core_news_lg

Retry with verbose output:

poetry run python -m spacy download fr_core_news_lg --verbose

Behind corporate firewall: Contact IT for proxy configuration or download the model package manually from https://github.com/explosion/spacy-models

`poetry install` fails with dependency conflicts¶

Solution: 1. Verify Python version (must be 3.10-3.12) 2. Clear virtual environment and reinstall:

poetry env remove python
poetry install

3. Update Poetry:

poetry self update

`gdpr-pseudo: command not found`¶

Cause: The CLI requires the poetry run prefix during development.

Solution: Always use poetry run:

# CORRECT
poetry run gdpr-pseudo --help

# INCORRECT
gdpr-pseudo --help

Alternative: Activate Poetry shell:

poetry shell
gdpr-pseudo --help
exit

Passphrase Issues¶

`Passphrase must be at least 12 characters`¶

Cause: Security requirement -- passphrases must be 12+ characters.

Solution: 1. Use a passphrase with at least 12 characters 2. Or set via environment variable:

# macOS/Linux
export GDPR_PSEUDO_PASSPHRASE="your-secure-passphrase-here"

# Windows PowerShell
$env:GDPR_PSEUDO_PASSPHRASE = "your-secure-passphrase-here"

`Incorrect passphrase`¶

Error: Incorrect passphrase. Please check your passphrase and try again.

Solution: - Verify you are using the exact passphrase used when creating the database - Check for trailing spaces or invisible characters - If using environment variable, verify: echo $GDPR_PSEUDO_PASSPHRASE (Linux/macOS) or echo $env:GDPR_PSEUDO_PASSPHRASE (PowerShell)

Forgot passphrase¶

Consequence: The mapping database cannot be decrypted. Existing mappings are permanently inaccessible and pseudonymization cannot be reversed.

Recovery: Create a new database (previous mappings are lost):

poetry run gdpr-pseudo init --force

Prevention: Store passphrases in a secure password manager.

`Passphrase in config file is forbidden`¶

Cause: A passphrase field was found in .gdpr-pseudo.yaml. Plaintext credential storage is blocked for security.

Solution: Remove the passphrase field from your config file. Use one of: - Environment variable: GDPR_PSEUDO_PASSPHRASE (recommended for automation) - Interactive prompt (most secure -- default behavior) - CLI flag: --passphrase (not recommended -- visible in shell history)

Database Errors¶

`Database file not found: mappings.db`¶

Cause: No database has been created yet.

Solution: Initialize a database:

poetry run gdpr-pseudo init

`Database may be corrupted`¶

Solution: 1. If you have a backup, restore it 2. Try exporting data: poetry run gdpr-pseudo export backup.json 3. Create a new database: poetry run gdpr-pseudo init --force

Inconsistent pseudonyms across documents¶

Cause: Using different database files for related documents.

Solution: Always specify the same database:

poetry run gdpr-pseudo process doc1.txt --db shared.db
poetry run gdpr-pseudo process doc2.txt --db shared.db

NLP Processing Errors¶

No entities detected¶

Possible causes: - Document does not contain recognizable French text - File encoding is not UTF-8 - File format not supported

Solutions: 1. Ensure text is in French with proper encoding (UTF-8 with accents: é, è, à) 2. Verify the document contains names, locations, or organizations 3. Verify file is a supported format (.txt, .md, .pdf, .docx, .xlsx, .csv) 4. For PDF/DOCX, ensure optional extras are installed: pip install gdpr-pseudonymizer[formats]. For Excel, install pip install gdpr-pseudonymizer[excel]. 5. Test with a known-good sample:

echo "Marie Dubois travaille a Paris pour Acme SA." > test.txt
poetry run gdpr-pseudo process test.txt

`Invalid theme 'xyz'`¶

Solution: Use one of the valid themes: neutral, star_wars, lotr

poetry run gdpr-pseudo process doc.txt --theme neutral

Validation UI Issues¶

Validation UI not responding¶

Cause: Terminal compatibility issue with keyboard input capture.

Solutions: 1. Use a standard terminal (PowerShell, Terminal.app, bash) 2. Avoid running in IDE integrated terminals (VS Code, PyCharm -- may have input issues) 3. Try poetry shell then run the command directly 4. On Windows, try Windows Terminal instead of legacy cmd.exe

Keyboard shortcuts not working¶

Solution: Press H or ? during validation to see the full help overlay with all available shortcuts. Some shortcuts (batch operations like Shift+A, Shift+R) are hidden by default.

Platform-Specific Issues¶

Windows: spaCy access violations¶

Symptom: Crash or access violation errors when running spaCy on Windows.

Solutions: 1. Use WSL (recommended): Install Windows Subsystem for Linux and run the tool there 2. Limit threads: Set OMP_NUM_THREADS=1 in environment:

$env:OMP_NUM_THREADS = 1

3. Update dependencies: - Update Windows to latest version - Update Visual C++ Redistributable

Note: This is a known spaCy issue on Windows (spaCy #12659). CI skips spaCy-dependent tests on Windows for this reason.

macOS: Xcode Command Line Tools required¶

Error: Build errors during poetry install

Solution:

xcode-select --install

macOS: Apple Silicon (M1/M2/M3)¶

Python 3.10+ has native ARM support. If using Homebrew:

brew install python@3.11

Linux: Missing build tools¶

Error: Compilation errors during poetry install

Solution:

Ubuntu/Debian:

sudo apt install python3-dev build-essential

Fedora:

sudo dnf install python3-devel gcc

Permission denied errors¶

Solutions: - macOS/Linux: Check file permissions with ls -la, ensure write access - Windows: Run PowerShell as Administrator for installation steps - Ensure write access to the project directory and output paths

Performance Issues¶

Processing slows or crashes on large batches¶

Solutions: - Process files in smaller batches - Reduce parallel workers: --workers 1 - Close other applications to free memory (spaCy model uses ~1.5GB per worker) - Monitor memory usage -- the tool requires up to 8GB RAM with 4 workers

spaCy model loading is slow¶

Cause: The fr_core_news_lg model is ~571MB and takes a few seconds to load on first use.

Mitigation: The model is cached in memory after first load. Subsequent documents in a batch session process faster.

PDF/DOCX Processing Issues¶

`PDF support requires 'pdfplumber'`¶

Cause: The optional PDF dependency is not installed.

Solution:

pip install gdpr-pseudonymizer[pdf]
# Or install both PDF and DOCX support:
pip install gdpr-pseudonymizer[formats]

`DOCX support requires 'python-docx'`¶

Cause: The optional DOCX dependency is not installed.

Solution:

pip install gdpr-pseudonymizer[docx]
# Or install both PDF and DOCX support:
pip install gdpr-pseudonymizer[formats]

"This PDF appears to be scanned/image-based"¶

Cause: The PDF contains images instead of text (scanned document). pdfplumber can only extract embedded text, not perform OCR.

Solutions: 1. Use an OCR tool (e.g., Adobe Acrobat, Tesseract) to convert the scanned PDF to a text-based PDF first 2. Export the scanned PDF to .txt manually, then process the text file 3. If partial text was extracted, review the output -- it may still contain useful content

Note: OCR support is not planned for the current version.

PDF is password-protected¶

Error: PDF is password-protected: <file>. Please provide an unprotected PDF.

Solution: Remove the password protection from the PDF before processing. Most PDF editors (Adobe Acrobat, Preview on macOS) can remove passwords if you know the original password.

Output file has `.txt` extension for PDF/DOCX input¶

This is expected behavior. PDF/DOCX inputs produce plaintext .txt output because the tool extracts text content only. Format-preserving output (PDF-to-PDF, DOCX-to-DOCX) is planned for v1.2+.

When to File Bug Reports¶

File a bug report on GitHub if you encounter:

Crashes that are not explained by the troubleshooting entries above
Incorrect pseudonymization (entity replaced with wrong type of pseudonym)
Data loss (database corruption, missing mappings)
Security concerns (passphrase exposure, encryption issues)

How to report: - GitHub Issues: https://github.com/LioChanDaYo/RGPDpseudonymizer/issues - Include: OS version, Python version, full error message, steps to reproduce - Do NOT include sensitive data (real names, document content) in bug reports

Installation Guide - Platform-specific setup
CLI Reference - Complete command documentation
FAQ - Common questions
Tutorial - Step-by-step usage guides

Troubleshooting Guide¶

Installation Issues¶

poetry: command not found¶

Python version not supported¶

spaCy model download fails¶

poetry install fails with dependency conflicts¶

gdpr-pseudo: command not found¶

Passphrase Issues¶

Passphrase must be at least 12 characters¶

Incorrect passphrase¶

Forgot passphrase¶

Passphrase in config file is forbidden¶

Database Errors¶

Database file not found: mappings.db¶

Database may be corrupted¶

Inconsistent pseudonyms across documents¶

NLP Processing Errors¶

No entities detected¶

Invalid theme 'xyz'¶

Validation UI Issues¶

Validation UI not responding¶

Keyboard shortcuts not working¶

Platform-Specific Issues¶

Windows: spaCy access violations¶

macOS: Xcode Command Line Tools required¶

macOS: Apple Silicon (M1/M2/M3)¶

Linux: Missing build tools¶

Permission denied errors¶

Performance Issues¶

Processing slows or crashes on large batches¶

spaCy model loading is slow¶

PDF/DOCX Processing Issues¶

PDF support requires 'pdfplumber'¶

DOCX support requires 'python-docx'¶

"This PDF appears to be scanned/image-based"¶

PDF is password-protected¶

Output file has .txt extension for PDF/DOCX input¶

When to File Bug Reports¶

Related Documentation¶

`poetry: command not found`¶

`poetry install` fails with dependency conflicts¶

`gdpr-pseudo: command not found`¶

`Passphrase must be at least 12 characters`¶

`Incorrect passphrase`¶

`Passphrase in config file is forbidden`¶

`Database file not found: mappings.db`¶

`Database may be corrupted`¶

`Invalid theme 'xyz'`¶

`PDF support requires 'pdfplumber'`¶

`DOCX support requires 'python-docx'`¶

Output file has `.txt` extension for PDF/DOCX input¶