Agentic Graph-RAG Over Social-Network Knowledge Graphs

Abstract

This lab presents an extended Graph-RAG framework for analyzing influence and information flow in social networks by integrating graph structure, neural ranking, and agentic LLM reasoning. Unlike traditional RAG that relies on vector search over unstructured text, this approach retrieves subgraphs of user neighborhoods and uses a Graph Convolutional Network (GCN) to estimate node influence, enabling context-aware, structurally grounded insights for tasks like influencer identification and outreach planning.

Key Objectives

After completing the lab, participants will be able to:

Construct a directed social-network knowledge graph from CSV data using pandas and NetworkX.
Engineer graph-based structural features (PageRank, in/out degree, k-core, clustering coefficient, topic similarity) for neural network training.
Train and evaluate a GCN to rank nodes by relative influence using pairwise ranking loss.
Implement a graph-aware retrieval module that extracts relevant subgraphs in response to natural-language queries.
Build a multi-step LangGraph agent that orchestrates planning, subgraph retrieval, GCN scoring, and LLM-driven synthesis of actionable insights.

Pipeline Architecture

The end-to-end Agentic Graph-RAG pipeline consists of five sequential stages coordinated by a LangGraph agent:

User Query → PLAN: LangGraph interprets the natural-language query to infer relevant topics and intent.
PLAN → RETRIEVE: A relevant subgraph is extracted from the Facebook Page–Page network by seeding topic-matched nodes and expanding via k-hop neighborhoods.
RETRIEVE → SCORE: A pre-trained GCN performs message passing over the retrieved subgraph to compute scalar influence scores for each node, combining structural and semantic features.
SCORE → SYNTHESIZE: An LLM generates a structured summary explaining node rankings and providing evidence-based outreach recommendations.
SYNTHESIZE → Final Output: The agent produces an actionable, human-readable response grounded in graph intelligence.

GCN-Based Influence Ranking

A Graph Convolutional Network (GCN) is used as a learned ranking model to estimate node importance within retrieved subgraphs. Key design details:

Node Feature Inputs: Each node is initialized with a feature vector encoding structural properties (PageRank, in/out degree, k-core value, clustering coefficient), activity signals (30-day posting volume), and query-dependent topic similarity.
Message Passing Mechanism: The GCN propagates features across graph edges, enabling nodes to aggregate multi-hop neighborhood context and capture relational patterns that cannot be expressed by simple heuristics like degree or PageRank alone.
Training Objective: The model is trained using pairwise ranking loss to learn relative node ordering, aligning directly with downstream influencer prioritization tasks rather than predicting absolute influence values.
Inference Workflow: At runtime, the GCN is applied exclusively to retrieved subgraphs to produce context-specific influence scores that guide downstream agent reasoning.

Agentic Reasoning Integration

Influence scores from the GCN act as an intermediate, evidence-based signal to ground agent decision-making rather than serving as a standalone output:

Scores prioritize high-impact nodes and filter noisy or peripheral graph regions to focus reasoning resources.
Ranked, feature-enriched node data is provided to the LLM for generative synthesis.
The LLM connects numerical influence scores to observable graph properties (connectivity, activity, topical relevance) to produce interpretable explanations, separating learned influence modeling from generative narrative creation.

Implementation & Experimental Setup

Dataset

The lab uses a curated subset of the public MUSAE Facebook Page–Page network, containing ~22,000 nodes (representing Facebook pages) and ~300,000 directed FOLLOW edges. Node attributes include topic labels, follower/following counts, and recent posting activity metrics.

Core Dependencies

Graph processing: NetworkX
Neural modeling: PyTorch, PyTorch Geometric
Agent orchestration: LangGraph
LLM integration: OpenAI GPT-4o-mini (default model)

Key Implementation Workflow

Graph construction from preprocessed CSV files with node attribute enrichment.
Structural feature engineering and normalization for GCN input.
GCN training with pairwise ranking loss using a heuristic influence target.
Subgraph retrieval utility that expands from topic-matched seed nodes to k-hop neighbors.
LangGraph agent implementation with four core execution nodes: plan, retrieve, score, synthesize.
Visualization of top influencer 1-hop neighborhoods to validate structural reasoning.

Conclusion

This framework demonstrates that graph-aware retrieval and learned influence modeling enable deeper, more interpretable social network analysis compared to text-only RAG approaches. The modular integration of LangGraph for planning, GCN for structural ranking, and LLMs for generative synthesis creates a flexible foundation that can be extended with temporal analysis, advanced GNN architectures, or more complex agentic behaviors.

References

Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR).
MUSAE Facebook Page–Page Network: https://snap.stanford.edu/data/facebook-large-page-page-network.html

EL-Notepad

探索