Troubleshooting Guide

Common issues and solutions when using SciLEx.

Installation Issues

Python Version Error

Problem:

Error: Python 3.13+ required

Solution:

# Check your Python version
python --version

# Install Python 3.13+ from python.org
# Or use pyenv
pyenv install 3.13
pyenv local 3.13

Module Not Found

Problem:

ModuleNotFoundError: No module named 'pandas'

Solution:

# Reinstall dependencies
uv sync
# Or with pip
pip install -r requirements.txt

uv Command Not Found

Problem:

bash: uv: command not found

Solution:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or use pip instead
pip install -r requirements.txt

API Issues

API Key Invalid

Problem:

Error: IEEE API key validation failed

Solution:

  1. Check your API key is correct in scilex/api.config.yml

  2. Verify the key name matches the expected snake_case form (e.g., sem_scholar, not semantic_scholar)

  3. Remove any extra spaces or quotes

  4. Verify key is active on API provider’s dashboard

  5. Check if key has expired

Rate Limit Errors (429)

Problem:

HTTP 429: Too Many Requests

Solution:

# Lower rate limits in scilex/api.config.yml
rate_limits:
  SemanticScholar: 0.5  # Reduce from 1.0
  IEEE: 5.0  # Reduce from 10.0

Connection Timeout

Problem:

requests.exceptions.Timeout: Request timed out

Solution:

# Check internet connection
ping api.semanticscholar.org

# Try again - timeouts can be transient
# Or check if the API is down

SSL Certificate Error

Problem:

SSL: CERTIFICATE_VERIFY_FAILED

Solution (macOS):

pip install --upgrade certifi

Collection Issues

No Results Found

Problem: Collection returns 0 papers

Solutions:

  1. Check keywords are not too specific

  2. Try broader search terms

  3. Try different years

  4. Try different APIs

  5. Search only in titles first:

    fields: ["title"]
    

Too Many Results

Problem: Collection returns millions of papers

Solutions:

  1. Use dual keyword groups:

    keywords:
      - ["specific term"]
      - ["another specific term"]
    
  2. Limit results:

    max_results_per_api: 1000
    
  3. Reduce year range:

    years: [2024]  # Just current year
    

Collection Stuck

Problem: Collection appears frozen

Solution:

  1. Check if it’s actually making progress (logs update slowly for large queries)

  2. Enable debug logging:

    LOG_LEVEL=DEBUG uv run python src/run_collecte.py
    
  3. Check API rate limits aren’t too low

  4. Try with fewer APIs first

Missing Output Directory

Problem:

FileNotFoundError: output/ directory not found

Solution:

# Create output directory
mkdir -p output

Aggregation Issues

No Papers After Filtering

Problem: Aggregation filters out all papers

Solutions:

  1. Check dual keyword logic:

    # If only one group is needed:
    keywords:
      - ["your", "keywords"]
      - []  # Keep second group empty
    
  2. Disable strict filters:

    quality_filters:
      apply_citation_filter: false
      enable_itemtype_filter: false
    
  3. Check the aggregation report for which filter removed papers

Memory Error

Problem:

MemoryError: Unable to allocate array

Solution:

# Use parallel mode with batching (default)
uv run python src/aggregate_collect.py

# Or reduce the year range and re-collect

Slow Aggregation

Problem: Aggregation takes too long

Solution:

# Ensure parallel mode is used (default)
uv run python src/aggregate_collect.py

# Skip citations if not needed - edit src/scilex.config.yml:
# aggregate_get_citations: false

Zotero Issues

Upload Failed

Problem: Papers don’t appear in Zotero

Solution:

  1. Check API key in scilex/api.config.yml

  2. Verify user_mode is set correctly ("user" or "group")

  3. Check that the target Zotero collection exists

  4. Check Zotero storage quota

Duplicate Papers

Problem: Same papers uploaded multiple times

Solution:

# The system checks URLs to avoid duplicates.
# If papers have different URLs they will be treated as distinct.
# You can manually deduplicate in Zotero using its built-in merge tool.

Configuration Issues

Invalid YAML

Problem:

yaml.scanner.ScannerError: mapping values are not allowed here

Solution:

  1. Check indentation (use spaces, not tabs)

  2. Check special characters are quoted:

    keywords: ["term: with colon"]  # Use quotes
    
  3. Validate YAML online

Config Not Found

Problem:

FileNotFoundError: scilex.config.yml not found

Solution:

# Check that src/scilex.config.yml exists
ls src/scilex.config.yml

# Edit to match your needs
nano src/scilex.config.yml

Data Quality Issues

Missing Abstracts

Problem: Many papers have “NA” for abstracts

Explanation: Some APIs don’t provide abstracts (e.g., DBLP by policy, ORKG by design). This is expected.

Solution: Use APIs with better abstract coverage:

  • SemanticScholar (95%)

  • IEEE (100%)

  • Arxiv (100%)

Low Citation Counts

Problem: Most papers show 0 citations

Explanation: OpenCitations has limited coverage for recent papers and preprints.

Solution: This is normal. Citation data is best-effort only.

Incorrect Paper Metadata

Problem: Author names or titles incorrect

Solution: This comes from the source API. Report issues to the API provider.

Debugging

Enable Debug Logging

LOG_LEVEL=DEBUG uv run python src/run_collecte.py

Check Logs

Look for errors in console output or check the collection directory for state files.

Test Individual APIs

uv run python "src/API tests/SemanticScholarAPI.py"
uv run python "src/API tests/OpenAlexAPI.py"

Verify Configuration

# Check YAML syntax
uv run python -c "import yaml; yaml.safe_load(open('src/scilex.config.yml'))"

Getting Help

If none of these solutions work:

  1. Check console output for error messages

  2. Enable DEBUG logging

  3. Try with a minimal configuration

  4. Check if the issue is API-specific

Common Error Messages

“Circuit breaker OPEN”

Meaning: API has failed multiple times and is being skipped

Action: This is normal. The system will retry later. If persistent, check API status.

“Waiting for rate limit”

Meaning: Respecting API rate limits

Action: This is normal. Be patient or adjust rate limits.

“Query already completed, skipping”

Meaning: Results already exist for this query

Action: This is normal (idempotent behavior). Delete the output directory to re-collect.