SciLEx Web Interface
Complete web-based interface for SciLEx - combining a FastAPI REST backend with a Streamlit frontend for interactive paper collection and analysis.
Features
🎯 Core Functionality
Multi-Source Paper Collection: Search 10+ academic databases simultaneously
Free APIs: SemanticScholar, OpenAlex, Arxiv, PubMed, DBLP, HAL
Paid APIs: IEEE, Elsevier, Springer
Integration: Zotero, HuggingFace
Advanced Filtering:
By year, source, publication type
By abstract length and content
Relevance ranking by keywords
Configuration Management:
Easy API key configuration through web interface
Persistent storage of settings
Support for multiple API tiers
Results Management:
View statistics (papers by year, source, citations)
Export in multiple formats (CSV, BibTeX, JSON)
Interactive filtering of results
Pagination and search
Pipeline Tracking:
Background job management
Real-time progress updates
Job history and status monitoring
Demo video ⏯️
Quick Start
Installation
Install required dependencies:
# If not already installed
uv add fastapi uvicorn streamlit pandas pyyaml
# Or use the main requirements.txt
pip install -e .
Running the Interface
Option 1: Run Both API and Web Interface (Recommended)
python scilex/webapi/run_interface.py
This will:
Start FastAPI backend on
http://localhost:8000Start Streamlit on
http://localhost:8501Automatically open the web interface in your browser
Option 2: Run Only the API
python scilex/webapi/run_interface.py --api-only
API will be available at http://localhost:8000
Interactive API docs:
http://localhost:8000/docs
Option 3: Run Only the Web Interface
python scilex/webapi/run_interface.py --web-only
Option 4: Custom Ports
python scilex/webapi/run_interface.py \
--api-port 9000 \
--web-port 8888 \
--host 0.0.0.0
Usage Guide
1. Configure API Keys
Click the ⚙️ Configuration section in the sidebar
Expand 🔑 API Keys
Select an API service
Enter credentials:
SemanticScholar: API key (free tier available)
IEEE: API key
Elsevier: API key + optional institutional token
Springer: API key
Zotero: API key + User ID
HuggingFace: Access token
Click ✅ Save API Configuration
2. Start a New Collection
Go to 🔬 New Collection tab
Fill in parameters:
Collection Name: Unique identifier
Years: Select publication years
Keywords: Enter search terms (one per line)
Data Sources: Choose APIs to search
Quality Filters: Set minimum standards
Click 🚀 Start Collection Pipeline
3. View and Analyze Results
Go to 📊 View Results tab
Select a completed collection
View statistics:
Papers by year (bar chart)
Papers by source (bar chart)
Total papers, year range, sources, average citations
Browse papers with pagination
Click on papers to view full details
4. Filter and Export
Go to 🔍 Filter & Export tab
Apply filters:
Year range
Data sources
Citation count range
Abstract length
Click ✅ Apply Filters
Export using:
CSV: Download filtered results
JSON: Structured format for processing
BibTeX: For citation management (if available)
5. View Collection History
Go to 📈 Collections History tab to see all past collections with:
Number of papers
File size
Creation date
API Reference
The FastAPI backend provides REST endpoints for programmatic access:
Configuration Endpoints
GET /api-config
Get current API configuration (with sensitive data masked)
POST /api-config
Content-Type: application/json
{
"api_name": "SemanticScholar",
"api_key": "YOUR_KEY"
}
Update API configuration
GET /available-apis
List all available APIs with descriptions
Collection Endpoints
POST /pipelines/start
Content-Type: application/json
{
"collection_config": {
"keywords": [["machine learning"], ["application"]],
"years": [2023, 2024],
"apis": ["SemanticScholar", "OpenAlex"],
"collect_name": "ml_apps"
},
"api_config": {
"SemanticScholar": {"api_key": "YOUR_KEY"}
}
}
Start a new collection pipeline
GET /pipelines/{job_id}/status
Check status of a running pipeline
GET /pipelines
List all pipeline jobs
Results Endpoints
GET /results/{collect_name}?limit=100&skip=0
Get aggregated results with pagination
GET /results/{collect_name}/stats
Get statistics about results
POST /export
Content-Type: application/json
{
"collect_name": "ml_apps",
"format": "csv"
}
Export results (csv, json, bibtex)
GET /collections
List all available collections
Filtering Endpoints
POST /filter/{collect_name}
Content-Type: application/json
{
"enable_itemtype_filter": true,
"allowed_item_types": ["journalArticle", "conferencePaper"],
"max_papers": 500
}
Apply filters to results
Examples
Python API Client
import requests
import json
BASE_URL = "http://localhost:8000"
# 1. Configure API keys
api_config = {
"api_name": "SemanticScholar",
"api_key": "YOUR_API_KEY"
}
requests.post(f"{BASE_URL}/api-config", json=api_config)
# 2. Start collection
pipeline_request = {
"collection_config": {
"keywords": [["large language model", "LLM"], ["evaluation"]],
"years": [2023, 2024],
"apis": ["SemanticScholar", "OpenAlex"],
"collect_name": "llm_eval_2024"
},
"api_config": {
"SemanticScholar": {"api_key": "YOUR_KEY"}
}
}
response = requests.post(f"{BASE_URL}/pipelines/start", json=pipeline_request)
job_id = response.json()["job_id"]
# 3. Monitor progress
import time
while True:
status = requests.get(f"{BASE_URL}/pipelines/{job_id}/status").json()
print(f"Progress: {status['progress']}% - {status['message']}")
if status['status'] in ['completed', 'failed']:
break
time.sleep(5)
# 4. Get results
results = requests.get(f"{BASE_URL}/results/llm_eval_2024").json()
print(f"Retrieved {results['total']} papers")
# 5. Export to CSV
export_request = {
"collect_name": "llm_eval_2024",
"format": "csv"
}
response = requests.post(f"{BASE_URL}/export", json=export_request)
with open("results.csv", "wb") as f:
f.write(response.content)
Bash/cURL Examples
# Get available APIs
curl http://localhost:8000/available-apis
# Check API configuration
curl http://localhost:8000/api-config
# Update API key
curl -X POST http://localhost:8000/api-config \
-H "Content-Type: application/json" \
-d '{"api_name":"SemanticScholar","api_key":"YOUR_KEY"}'
# List collections
curl http://localhost:8000/collections
# Get collection statistics
curl http://localhost:8000/results/my_collection/stats
Configuration
Output Directory
Configure where results are saved:
In web interface: Set in Configuration sidebar
Via API: Pass
output_dirin collection configDefault:
{project_root}/output
API Modes
SemanticScholar Modes:
regular: Standard search endpoint, 50-100 results per page (default)bulk: Bulk search endpoint, 1000 results per page (requires approval)
Quality Filters
All supported filters:
enable_text_filter: Remove low-quality papersmin_abstract_words: Minimum abstract length (default: 50)max_abstract_words: Maximum abstract length (default: 1000)enable_itemtype_filter: Whitelist publication typesallowed_item_types: Types to allow (journalArticle, conferencePaper, etc.)apply_relevance_ranking: Sort by keyword relevancemax_papers: Return top N papers by relevance
Advanced Usage
Custom Pipeline Scripts
For programmatic access without the web interface:
import sys
from pathlib import Path
# Setup path
PROJECT_ROOT = Path(__file__).parent.parent
sys.path.insert(0, str(PROJECT_ROOT / "src"))
from crawlers.collector_collection import CollectCollection
# Configure
main_config = {
"keywords": [["machine learning"], ["application"]],
"years": [2023, 2024],
"apis": ["SemanticScholar", "OpenAlex"],
"collect_name": "ml_apps"
}
api_config = {
"SemanticScholar": {"api_key": "YOUR_KEY"}
}
# Run
collector = CollectCollection(main_config, api_config)
collector.create_collects_jobs()
See docs/user-guides/python-scripting.md for more details.
Integration with Other Tools
Export to Zotero:
from scilex.push_to_zotero import main as zotero_main
zotero_main() # Requires Zotero API key configured
Export to BibTeX:
from scilex.export_to_bibtex import main as bibtex_main
bibtex_main() # Creates .bib file
Troubleshooting
Common Issues
Port already in use:
python scilex/webapi/run_interface.py --api-port 9000 --web-port 8888
API keys not working:
Verify credentials in web interface Configuration tab
Check API website for current key format
Ensure API is not rate-limited
Look for error messages in terminal
Collection producing no results:
Try simpler keywords
Expand year range
Add more data sources
Check data source availability
Verify API keys for paid services
Streamlit not opening in browser:
Manually visit
http://localhost:8501Check firewall settings
Try different browser
API documentation not loading:
Ensure API is running on correct port
Visit
http://localhost:8000/docs(interactive)Visit
http://localhost:8000/openapi.json(raw schema)
Architecture
SciLEx Web Interface
├── Backend (FastAPI)
│ ├── /api-config - API key management
│ ├── /pipelines - Job management
│ ├── /results - Data retrieval
│ └── /export - Output handling
│
├── Frontend (Streamlit)
│ ├── New Collection - Pipeline setup
│ ├── View Results - Data exploration
│ ├── Filter & Export - Result refinement
│ ├── Collections History - Past runs
│ └── Help - Documentation
│
└── Core (Python)
├── CollectCollection - Multi-API aggregation
├── aggregate_collect - Deduplication & filtering
└── export functions - Format conversion
File Structure
scilex/webapi/
├── __init__.py # Package initialization
├── scilex_api.py # FastAPI backend
├── web_interface.py # Streamlit frontend
├── run_interface.py # Launch script
└── README.md # This file
Performance Tips
Reduce API calls: Use fewer APIs, narrow year range
Parallel aggregation: Increase
--workers(default: 3)Skip citations: Use
--skip-citationsflagBatch operations: Combine multiple keywords into one search
Filter early: Apply filters during collection when possible
Security Notes
API Keys: Stored in
scilex/api.config.yml(add to.gitignore)Sensitive Data: Masked in API responses
CORS: Enabled for all origins (configure in production)
Input Validation: All user inputs validated
HTTPS: Use reverse proxy (nginx) in production
Contributing
Want to improve the web interface?
Report bugs in GitHub issues
Submit feature requests
Create pull requests with improvements
Add endpoints as needed
License
Same as SciLEx main project - See LICENSE in project root
Support
Documentation: See docs/ directory
Issues: Report on GitHub
Questions: Create GitHub discussion
Changelog
v1.0.0
Initial release with FastAPI backend
Full Streamlit web interface
Support for all SciLEx features
Background job management
Multi-format export
