Troubleshooting Guide
Common issues and solutions when using SciLEx.
Installation Issues
Python Version Error
Problem:
Error: Python 3.13+ required
Solution:
# Check your Python version
python --version
# Install Python 3.13+ from python.org
# Or use pyenv
pyenv install 3.13
pyenv local 3.13
Module Not Found
Problem:
ModuleNotFoundError: No module named 'pandas'
Solution:
# Reinstall dependencies
uv sync
# Or with pip
pip install -r requirements.txt
uv Command Not Found
Problem:
bash: uv: command not found
Solution:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or use pip instead
pip install -r requirements.txt
API Issues
API Key Invalid
Problem:
Error: IEEE API key validation failed
Solution:
Check your API key is correct in
scilex/api.config.ymlVerify the key name matches the expected snake_case form (e.g.,
sem_scholar, notsemantic_scholar)Remove any extra spaces or quotes
Verify key is active on API provider’s dashboard
Check if key has expired
Rate Limit Errors (429)
Problem:
HTTP 429: Too Many Requests
Solution:
# Lower rate limits in scilex/api.config.yml
rate_limits:
SemanticScholar: 0.5 # Reduce from 1.0
IEEE: 5.0 # Reduce from 10.0
Connection Timeout
Problem:
requests.exceptions.Timeout: Request timed out
Solution:
# Check internet connection
ping api.semanticscholar.org
# Try again - timeouts can be transient
# Or check if the API is down
SSL Certificate Error
Problem:
SSL: CERTIFICATE_VERIFY_FAILED
Solution (macOS):
pip install --upgrade certifi
Collection Issues
No Results Found
Problem: Collection returns 0 papers
Solutions:
Check keywords are not too specific
Try broader search terms
Try different years
Try different APIs
Search only in titles first:
fields: ["title"]
Too Many Results
Problem: Collection returns millions of papers
Solutions:
Use dual keyword groups:
keywords: - ["specific term"] - ["another specific term"]
Limit results:
max_results_per_api: 1000
Reduce year range:
years: [2024] # Just current year
Collection Stuck
Problem: Collection appears frozen
Solution:
Check if it’s actually making progress (logs update slowly for large queries)
Enable debug logging:
LOG_LEVEL=DEBUG uv run python src/run_collecte.py
Check API rate limits aren’t too low
Try with fewer APIs first
Missing Output Directory
Problem:
FileNotFoundError: output/ directory not found
Solution:
# Create output directory
mkdir -p output
Aggregation Issues
No Papers After Filtering
Problem: Aggregation filters out all papers
Solutions:
Check dual keyword logic:
# If only one group is needed: keywords: - ["your", "keywords"] - [] # Keep second group empty
Disable strict filters:
quality_filters: apply_citation_filter: false enable_itemtype_filter: false
Check the aggregation report for which filter removed papers
Memory Error
Problem:
MemoryError: Unable to allocate array
Solution:
# Use parallel mode with batching (default)
uv run python src/aggregate_collect.py
# Or reduce the year range and re-collect
Slow Aggregation
Problem: Aggregation takes too long
Solution:
# Ensure parallel mode is used (default)
uv run python src/aggregate_collect.py
# Skip citations if not needed - edit src/scilex.config.yml:
# aggregate_get_citations: false
Zotero Issues
Upload Failed
Problem: Papers don’t appear in Zotero
Solution:
Check API key in
scilex/api.config.ymlVerify
user_modeis set correctly ("user"or"group")Check that the target Zotero collection exists
Check Zotero storage quota
Duplicate Papers
Problem: Same papers uploaded multiple times
Solution:
# The system checks URLs to avoid duplicates.
# If papers have different URLs they will be treated as distinct.
# You can manually deduplicate in Zotero using its built-in merge tool.
Configuration Issues
Invalid YAML
Problem:
yaml.scanner.ScannerError: mapping values are not allowed here
Solution:
Check indentation (use spaces, not tabs)
Check special characters are quoted:
keywords: ["term: with colon"] # Use quotes
Validate YAML online
Config Not Found
Problem:
FileNotFoundError: scilex.config.yml not found
Solution:
# Check that src/scilex.config.yml exists
ls src/scilex.config.yml
# Edit to match your needs
nano src/scilex.config.yml
Data Quality Issues
Missing Abstracts
Problem: Many papers have “NA” for abstracts
Explanation: Some APIs don’t provide abstracts (e.g., DBLP by policy, ORKG by design). This is expected.
Solution: Use APIs with better abstract coverage:
SemanticScholar (95%)
IEEE (100%)
Arxiv (100%)
Low Citation Counts
Problem: Most papers show 0 citations
Explanation: OpenCitations has limited coverage for recent papers and preprints.
Solution: This is normal. Citation data is best-effort only.
Incorrect Paper Metadata
Problem: Author names or titles incorrect
Solution: This comes from the source API. Report issues to the API provider.
Debugging
Enable Debug Logging
LOG_LEVEL=DEBUG uv run python src/run_collecte.py
Check Logs
Look for errors in console output or check the collection directory for state files.
Test Individual APIs
uv run python "src/API tests/SemanticScholarAPI.py"
uv run python "src/API tests/OpenAlexAPI.py"
Verify Configuration
# Check YAML syntax
uv run python -c "import yaml; yaml.safe_load(open('src/scilex.config.yml'))"
Getting Help
If none of these solutions work:
Check console output for error messages
Enable DEBUG logging
Try with a minimal configuration
Check if the issue is API-specific
Common Error Messages
“Circuit breaker OPEN”
Meaning: API has failed multiple times and is being skipped
Action: This is normal. The system will retry later. If persistent, check API status.
“Waiting for rate limit”
Meaning: Respecting API rate limits
Action: This is normal. Be patient or adjust rate limits.
“Query already completed, skipping”
Meaning: Results already exist for this query
Action: This is normal (idempotent behavior). Delete the output directory to re-collect.