Quick Start Guide
Get your first paper collection running. This assumes you’ve installed SciLEx.
Quick Start
1. Create Configuration
Create a test_collection.yml file at the project root:
keywords:
- ["machine learning"]
- []
years: [2024]
apis:
- OpenAlex
- Arxiv
fields: ["title", "abstract"]
collect: true
collect_name: "test"
max_results_per_api: 50
2. Run Collection
uv run python src/run_collecte.py
You’ll see progress like:
Progress: 1/4 (25%) collections completed
Progress: 2/4 (50%) collections completed
...
3. Aggregate Results
uv run python src/aggregate_collect.py
Results saved to output/collect_*/aggregated_data.csv
4. View Results
# View first few lines
head output/collect_*/aggregated_data.csv
Or open in spreadsheet software.
Real Collection Example
For a proper research collection, edit src/scilex.config.yml:
keywords:
- ["knowledge graph", "ontology"] # Domain
- ["large language model", "LLM"] # Technology
years: [2022, 2023, 2024]
apis:
- SemanticScholar
- OpenAlex
- Arxiv
fields: ["title", "abstract"]
aggregate_get_citations: true
quality_filters:
enable_itemtype_filter: true
allowed_item_types:
- journalArticle
- conferencePaper
apply_relevance_ranking: true
max_papers: 500
Then run:
uv run python src/run_collecte.py
uv run python src/aggregate_collect.py
CSV Output Columns
title- Paper titleauthors- Author listyear- Publication yearDOI- Digital Object Identifierabstract- Full abstractitemType- Publication typecitation_count- Citations (if enabled)quality_score- Metadata completeness (0-100)relevance_score- Relevance (0-10)
Next Steps
Configuration Guide - All config options
Basic Workflow - Detailed workflow
Advanced Filtering - Filtering options