Graph–Text Fusion for Ad Recommendations: Cold-Start, Reranking, and Real-World Ops

2025-09-01

TL;DR — We built a multilingual recommendation system that fuses graph signals (Node2Vec on user/search/item interactions) with text semantics (bi-encoder). It serves via Qdrant/HNSW, uses dynamic fusion to survive cold-start, and shows fusion > unimodal in offline metrics. Cross-encoder reranking helps only on weak text-only setups.

Why this system?

Given a job/advert item i, we want a ranked list of similar items users would plausibly click/apply next. Two complementary notions of “closeness” matter:

Behavioral proximity (graph): co-clicks, purchases, co-views within the same searches/sessions
Semantic proximity (text): similar roles/skills/seniority inferred from titles + descriptions

Key constraints: data sparsity, multilingual/noisy text, cold-start items, and strict latency SLOs. We therefore:

keep text/graph encoders separate (late fusion),
rely on implicit feedback only,
serve via ANN (Qdrant/HNSW) with a small reranking budget.

Data schema & cleaning

Events

(event_type, client_id, item_id, ds_search_id, timestamp) — event_type ∈ {click, purchase}

Item text

(item_id, pozisyon_adi, item_id_aciklama)

Text normalization checklist

Strip HTML/scripts/entities → collapse whitespace → lowercase
Concatenate title + description; up-weight title (e.g., duplicate tokens with weight=2)
Optional: de-dup, light stopwording, preserve domain terms

ID hygiene

Cast IDs to strings, trim, prefix types (u:, i:, s:)
Parse timestamps permissively (e.g., errors="coerce")

Building the interaction graph

Nodes: users (u:), items (i:), searches (s:)
Edges:

u→i interaction: weights click=1, purchase=5
u→s (optional): user performed search
s→i: item displayed in a search
i↔i co-view: items co-appearing under the same ds_search_id (undirected)

Co-view weighting

c_{ij} = Σ_s 1{i∈s}·1{j∈s}, w_{ij} = log(1 + c_{ij}) or w_{ij} = c_{ij}·γ, with c_{ij} ≥ c_min.

Use c_min ≥ 2 to avoid pair explosion
Log-scale to reduce popularity bias
Co-views act as weak supervision that densifies neighborhoods early

Operational knobs

Cap per search (pair only top-m results)
Keep i↔i undirected unless you really need directionality

Embeddings (graph + text)

Graph side — Node2Vec

Biased random walks over a weighted heterogeneous graph
Tune (p, q) for a mild BFS bias (stays local → good for similar-item retrieval)
Up-weight reliable ties (purchases > clicks; frequent co-views > rare)

Items with no edges yield near-zero vectors — we’ll handle that via dynamic fusion.

Text side — Multilingual bi-encoder

Encode cleaned (title + description) → t_i
L2-normalize: stability + cosine reduces to dot product
Pros: robust cross-lingual semantics, strong zero-shot, excellent for cold-start
Cons: may over-rank semantically similar but commercially weak items if used alone

Fusion that survives cold-start

Concat fusion preserves modality axes:

z_i = [α·t̂_i ; β·ĝ_i] (with both halves L2-normalized)

Search with cosine over z_i
Choose (α, β) to reflect business priors (more graph when available)

Dynamic fusion rule

if ‖g_i‖ < τ → (α, β) = (1, 0) (text-only fallback)
else → (α, β) = tuned weights

Alternatives

Weighted sum (needs same dims for halves)
Learned fusion: small MLP over [t; g] trained on CTR/CVR
GNNs (GraphSAGE/PinSAGE): richer context, higher serving cost

Serving: Qdrant + HNSW

Index z_i with HNSW (cosine); payload keeps item_id + metadata
Millisecond top-K at high recall with good parameters

Ops tips

Batch ingestion (500–2000) to avoid timeouts
Monitor ANN recall/latency; reindex if params change
Keep ID mapping consistent from embedding to index

Should you add a reranker?

Cross-encoder rereads (query, candidate) jointly, often improving NDCG@top-K — but adds latency.

Budget K ∈ [20, 100]
Cache texts and consider mixed precision

When it helps / hurts

On text-only retrieval → small gains (sharper top of list)
On strong fusion retrieval → can hurt (misaligned with behavior-driven utility, thin descriptions, or “don’t fix what isn’t broken”)

Offline results (highlights)

Retrieval (Recall@10 / NDCG@10 / MAP@10 / Precision@10)

Text-only: 0.1712 / 0.1963 / 0.0966 / 0.0409
Graph-only: 0.6877 / 0.6027 / 0.4521 / 0.1967
Fusion: 0.7120 / 0.6214 / 0.4636 / 0.2010
Fusion (+clean): 0.7119 / 0.6257 / 0.4639 / 0.1987

Rerank on fusion (top-50)

Baseline Fusion: Recall@10 0.7312, NDCG@10 0.4469
Fusion + Rerank: Recall@10 0.3874, NDCG@10 0.2068 → degrades

Rerank on text-only

Baseline: NDCG@10 0.0909
+Rerank: 0.0988 (small lift; still far below graph/fusion)

Engineering playbook

Graph construction

edges_ui = aggregate_user_item(events, weights={"click":1,"purchase":5})
edges_si = from_search_results(search_id, item_id, weight="count")
edges_ii = coview_pairs(search_id→items, cmin=2, weight="log1p")
G = build_graph(nodes=[users,items,searches], edges=[edges_ui,edges_si,edges_ii])

Node2Vec

n2v = Node2Vec(G, p=1.0, q=0.75, walk_length=40, num_walks=10, weighted=True)
g = n2v.fit(dim=dg)

Text encoder

t = bi_encoder.encode(clean(title) + " " + clean(desc), normalize=True)

Fusion & index

z = concat(alpha * t_hat, beta * g_hat)
qdrant.upsert(id=item_id, vector=z, payload=meta)

Dynamic fusion query

if ‖g_hat‖ < τ: vec = concat(t_hat, 0)
else: vec = concat(alpha * t_hat, beta * g_hat)
results = qdrant.search(vec, topK=50)

Optional rerank

if pipeline == "text_only":
scores = cross_encoder.score(query_text, candidate_texts)
results = sort_by(scores)

Diagnostics

Ego-nets: visualize subgraphs; confirm weights (purchase > click) and coherent clusters
UMAP on neighbors: compact clusters ≈ better perceived relevance
Segmented metrics: track Recall@K by item age, category, locale

Production notes

Monitor latency (P50, P95) of ANN
Track zero-graph rate; enrich co-views if high
Apply log-scaling on edges to mitigate popularity bias
Track multilingual fairness; watch embedding drift

Limitations & roadmap

Current gaps

No user cold-start (anonymous/new users)
No recency weighting (all interactions equal)
Fusion is heuristic, not learned end-to-end

Next

Temporal dynamics (recency, sessions)
Learned fusion (MLP over [t; g])
Graph neural recommenders (GraphSAGE, PinSAGE)
Online A/B for CTR/CVR validation

Key takeaways

Graph rules, text rescues — fusion wins across segments
Dynamic fusion handles cold-start smoothly
Search-derived co-views are cheap, effective enrichment
Rerankers help text-only, can hurt strong fusion
Modular stack = offline embeddings + ANN + optional rerank