Agentic Graph-RAG Over Social-Network Knowledge Graphs
Abstract
This lab presents an extended Graph-RAG framework for analyzing influence and information flow in social networks by integrating graph structure, neural ranking, and agentic LLM reasoning. Unlike traditional RAG that relies on vector search over unstructured text, this approach retrieves subgraphs of user neighborhoods and uses a Graph Convolutional Network (GCN) to estimate node influence, enabling context-aware, structurally grounded insights for tasks like influencer identification and outreach planning.
Key Objectives
After completing the lab, participants will be able to:
- Construct a directed social-network knowledge graph from CSV data using pandas and NetworkX.
- Engineer graph-based structural features (PageRank, in/out degree, k-core, clustering coefficient, topic similarity) for neural network training.
- Train and evaluate a GCN to rank nodes by relative influence using pairwise ranking loss.
- Implement a graph-aware retrieval module that extracts relevant subgraphs in response to natural-language queries.
- Build a multi-step LangGraph agent that orchestrates planning, subgraph retrieval, GCN scoring, and LLM-driven synthesis of actionable insights.
Pipeline Architecture
The end-to-end Agentic Graph-RAG pipeline consists of five sequential stages coordinated by a LangGraph agent:
- User Query → PLAN: LangGraph interprets the natural-language query to infer relevant topics and intent.
- PLAN → RETRIEVE: A relevant subgraph is extracted from the Facebook Page–Page network by seeding topic-matched nodes and expanding via k-hop neighborhoods.
- RETRIEVE → SCORE: A pre-trained GCN performs message passing over the retrieved subgraph to compute scalar influence scores for each node, combining structural and semantic features.
- SCORE → SYNTHESIZE: An LLM generates a structured summary explaining node rankings and providing evidence-based outreach recommendations.
- SYNTHESIZE → Final Output: The agent produces an actionable, human-readable response grounded in graph intelligence.
GCN-Based Influence Ranking
A Graph Convolutional Network (GCN) is used as a learned ranking model to estimate node importance within retrieved subgraphs. Key design details:
- Node Feature Inputs: Each node is initialized with a feature vector encoding structural properties (PageRank, in/out degree, k-core value, clustering coefficient), activity signals (30-day posting volume), and query-dependent topic similarity.
- Message Passing Mechanism: The GCN propagates features across graph edges, enabling nodes to aggregate multi-hop neighborhood context and capture relational patterns that cannot be expressed by simple heuristics like degree or PageRank alone.
- Training Objective: The model is trained using pairwise ranking loss to learn relative node ordering, aligning directly with downstream influencer prioritization tasks rather than predicting absolute influence values.
- Inference Workflow: At runtime, the GCN is applied exclusively to retrieved subgraphs to produce context-specific influence scores that guide downstream agent reasoning.
Agentic Reasoning Integration
Influence scores from the GCN act as an intermediate, evidence-based signal to ground agent decision-making rather than serving as a standalone output:
- Scores prioritize high-impact nodes and filter noisy or peripheral graph regions to focus reasoning resources.
- Ranked, feature-enriched node data is provided to the LLM for generative synthesis.
- The LLM connects numerical influence scores to observable graph properties (connectivity, activity, topical relevance) to produce interpretable explanations, separating learned influence modeling from generative narrative creation.
Implementation & Experimental Setup
Dataset
The lab uses a curated subset of the public MUSAE Facebook Page–Page network, containing ~22,000 nodes (representing Facebook pages) and ~300,000 directed FOLLOW edges. Node attributes include topic labels, follower/following counts, and recent posting activity metrics.
Core Dependencies
- Graph processing: NetworkX
- Neural modeling: PyTorch, PyTorch Geometric
- Agent orchestration: LangGraph
- LLM integration: OpenAI GPT-4o-mini (default model)
Key Implementation Workflow
- Graph construction from preprocessed CSV files with node attribute enrichment.
- Structural feature engineering and normalization for GCN input.
- GCN training with pairwise ranking loss using a heuristic influence target.
- Subgraph retrieval utility that expands from topic-matched seed nodes to k-hop neighbors.
- LangGraph agent implementation with four core execution nodes:
plan,retrieve,score,synthesize. - Visualization of top influencer 1-hop neighborhoods to validate structural reasoning.
Conclusion
This framework demonstrates that graph-aware retrieval and learned influence modeling enable deeper, more interpretable social network analysis compared to text-only RAG approaches. The modular integration of LangGraph for planning, GCN for structural ranking, and LLMs for generative synthesis creates a flexible foundation that can be extended with temporal analysis, advanced GNN architectures, or more complex agentic behaviors.
References
- Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR).
- MUSAE Facebook Page–Page Network: https://snap.stanford.edu/data/facebook-large-page-page-network.html