Learn Graph DB - Neo4j

Recently read somethings about Graph Rag, started to interested into Graph DB, learn and see how to use it in my work.

5th Apr 2025

Introduction

I recently saw they are some discussions about Graph Rag, which seems a powerful way to build knowledge systems. In the meanwhile, during KubeConf 2025, I met the folks at the Neo4j's booth. It's interesting to see how to use Graph DB to solve my problems.

A Brief Overview of Database Types

Before diving deeper into graph databases like Neo4j, it's helpful to understand the broader landscape of database technologies. Different types of databases are optimized for different kinds of data and access patterns. Here's a quick summary:

1. Relational Databases (SQL)

  • Concept: Stores data in predefined tables with rows and columns. Relationships between tables are explicitly defined using foreign keys. Data integrity is often enforced through schemas and constraints.
  • Use Cases: Structured data, transactions requiring ACID (Atomicity, Consistency, Isolation, Durability) compliance, complex queries involving multiple tables (joins).
  • Query Language: SQL (Structured Query Language).
  • Representatives: PostgreSQL, MySQL, Oracle Database, SQL Server, SQLite.

2. NoSQL Databases

This is a broad category encompassing several models that differ from the traditional relational approach. They often prioritize scalability, performance, and flexibility over the strict consistency of SQL databases.

a) Key-Value Stores

  • Concept: Simplest NoSQL type. Stores data as a collection of key-value pairs. Think of it like a giant dictionary or hash map.
  • Use Cases: Caching, session management, user profiles, real-time data lookups where access is primarily by key.
  • Representatives: Redis, Memcached, Amazon DynamoDB (also has document features).

b) Document Databases

  • Concept: Stores data in documents, typically in formats like JSON or BSON. Documents can be nested and don't require a strict, predefined schema for all documents in a collection.
  • Use Cases: Content management, catalogs, user profiles, applications where data structure evolves frequently.
  • Representatives: MongoDB, CosmosDB, Couchbase, Firestore.

c) Column-Family Stores

  • Concept: Stores data in tables with rows and dynamic columns. Data for a row is stored together by column families, which allows for efficient reads/writes of specific columns across many rows.
  • Use Cases: Big data applications, logging, data warehousing, systems requiring high write throughput.
  • Representatives: Apache Cassandra, HBase.

3. Graph Databases

  • Concept: Stores data as nodes (entities) and edges (relationships) connecting those nodes. Both nodes and edges can have properties. Optimized for traversing and querying complex relationships.
  • Use Cases: Social networks, recommendation engines, fraud detection, knowledge graphs, network and IT operations.
  • Query Languages: Often specific graph query languages like Cypher (Neo4j) or Gremlin (Apache TinkerPop).
  • Representatives: Neo4j, ArangoDB (multi-model), Amazon Neptune, JanusGraph.

Understanding these different types helps you choose the right tool for the job. Now, let's focus on the unique strengths of Graph Databases like Neo4j...

Neo4j Cypher queries examples:

MATCH (n:PLAYER) RETURN n
MATCH (n:PLAYER) WHERE n.name = "James Harden" RETURN n
MATCH (n:PLAYER {name: "James Harden"}) RETURN n
MATCH (n:PLAYER) WHERE n.name <> "James Harden" RETURN n

# Aggregation
MATCH (player:PLAYER) - [gamePlayed:PLAYED_AGAINST] -> (team:TEAM) RETURN player.name, COUNT(gamePlayed)

MATCH (x:PLAYER {name: 'xx'}) DETACH DELETE ja

MATCH (n) DETACH DELETE n

CREATE (newPlayer:PLAYER:COACH:GENERAL_MANAGER { name: "Aaron Guo", height: 1.70 }) RETURN newPlayer

# Create relation
MATCH (player:PLAYER {name: "Aaron Guo"}), (lakers:TEAM {name: "LA Lakers"} )
CREATE (player) - [:PLAY_FOR {salary: 2000000}] -> (lakers)

Here are some common and powerful use cases for Neo4j:

Recommendation Engines:

How it works: Neo4j can easily model relationships like "User A BOUGHT Product X", "User A is FRIENDS_WITH User B", "User B LIKED Product Y".

Use Case: By traversing these relationships, you can generate recommendations like "Users who bought X also bought Y" or "Your friends liked Z". It's much faster and more intuitive to query these connections in a graph than joining many tables in SQL.   Fraud Detection:

Fraud Detection

How it works: Fraud often involves complex rings of seemingly unrelated individuals, accounts, devices, or transactions. Neo4j can map these connections (e.g., "Account 1 USED_DEVICE A", "Account 2 USED_DEVICE A", "Account 1 SENT_MONEY_TO Account 3").  

Use Case: Identifying shared identifiers (like devices, IP addresses, physical addresses) or unusual transaction patterns that link potentially fraudulent accounts. Graph queries can quickly uncover these hidden rings that might be hard to spot in tabular data.   Knowledge Graphs:

Knowledge Graphs

How it works: Representing complex domains by connecting entities (people, places, organizations, concepts, products) with various relationships.

Use Case: Building internal enterprise knowledge bases, enhancing search engines (like Google's Knowledge Graph), managing metadata, or creating semantic applications where understanding the context and links between information is key.   Network and IT Operations:

Network and IT Operations

How it works: Modeling dependencies between servers, applications, services, routers, data centers, and virtual machines.  

Use Case: Visualizing network topology, performing impact analysis ("If this server goes down, what services are affected?"), identifying single points of failure, and speeding up root cause analysis during outages.   Identity and Access Management (IAM):

Identity and Access Management (IAM)

How it works: Modeling users, groups, roles, permissions, and resources, along with the relationships between them (e.g., "User X BELONGS_TO Group Y", "Group Y HAS_PERMISSION Z", "Permission Z APPLIES_TO Resource W").

Use Case: Quickly querying complex access rights ("Who has access to this sensitive resource?", "What access does this user have?", "Why does this user have access?"), managing hierarchies, and enforcing security policies.   Supply Chain Management & Logistics:

Supply Chain Management & Logistics

How it works: Mapping suppliers, manufacturers, distributors, warehouses, products, and shipments as nodes, and their interactions as relationships.  

Use Case: Tracking goods through the supply chain, identifying bottlenecks, optimizing routes, understanding dependencies, and assessing risks (e.g., "If Supplier A has an issue, which products are affected?").   Social Networking:

Social Networking

How it works: The classic example – modeling users and their connections (friends, follows, blocks), group memberships, posts, likes, shares, etc.  

Use Case: Finding friends-of-friends, suggesting connections, analyzing influence within the network, and powering social features efficiently.