Graph–Text Fusion for Ad Recommendations: Cold-Start, Reranking, and Real-World Ops
TL;DR — We built a multilingual recommendation system that fuses graph signals (Node2Vec on user/search/item interactions) with text semantics (bi-encoder). It serves via Qdrant/HNSW, uses dynamic fusion to survive cold-start, and shows fusion > unimodal in offline metrics. Cross-encoder reranking helps only on weak text-only setups.
Why this system?
Given a job/advert item i, we want a ranked list of similar items users would plausibly click/apply next. Two complementary notions of “closeness” matter:
- Behavioral proximity (graph): co-clicks, purchases, co-views within the same searches/sessions
- Semantic proximity (text): similar roles/skills/seniority inferred from titles + descriptions
Key constraints: data sparsity, multilingual/noisy text, cold-start items, and strict latency SLOs. We therefore:
- keep text/graph encoders separate (late fusion),
- rely on implicit feedback only,
- serve via ANN (Qdrant/HNSW) with a small reranking budget.
Data schema & cleaning
Events
(event_type, client_id, item_id, ds_search_id, timestamp) — event_type ∈ {click, purchase}
Item text
(item_id, pozisyon_adi, item_id_aciklama)
Text normalization checklist
- Strip HTML/scripts/entities → collapse whitespace → lowercase
- Concatenate title + description; up-weight title (e.g., duplicate tokens with weight=2)
- Optional: de-dup, light stopwording, preserve domain terms
ID hygiene
- Cast IDs to strings, trim, prefix types (
u:,i:,s:) - Parse timestamps permissively (e.g.,
errors="coerce")
Building the interaction graph
Nodes: users (u:), items (i:), searches (s:)
Edges:
u→iinteraction: weightsclick=1,purchase=5u→s(optional): user performed searchs→i: item displayed in a searchi↔ico-view: items co-appearing under the sameds_search_id(undirected)
Co-view weighting
c_{ij} = Σ_s 1{i∈s}·1{j∈s}, w_{ij} = log(1 + c_{ij}) or w_{ij} = c_{ij}·γ, with c_{ij} ≥ c_min.
- Use
c_min ≥ 2to avoid pair explosion - Log-scale to reduce popularity bias
- Co-views act as weak supervision that densifies neighborhoods early
Operational knobs
- Cap per search (pair only top-m results)
- Keep
i↔iundirected unless you really need directionality
Embeddings (graph + text)
Graph side — Node2Vec
- Biased random walks over a weighted heterogeneous graph
- Tune
(p, q)for a mild BFS bias (stays local → good for similar-item retrieval) - Up-weight reliable ties (purchases > clicks; frequent co-views > rare)
Items with no edges yield near-zero vectors — we’ll handle that via dynamic fusion.
Text side — Multilingual bi-encoder
- Encode cleaned (title + description) →
t_i - L2-normalize: stability + cosine reduces to dot product
- Pros: robust cross-lingual semantics, strong zero-shot, excellent for cold-start
- Cons: may over-rank semantically similar but commercially weak items if used alone
Fusion that survives cold-start
Concat fusion preserves modality axes:
z_i = [α·t̂_i ; β·ĝ_i] (with both halves L2-normalized)
- Search with cosine over
z_i - Choose
(α, β)to reflect business priors (more graph when available)
Dynamic fusion rule
if ‖g_i‖ < τ → (α, β) = (1, 0) (text-only fallback)
else → (α, β) = tuned weights
Alternatives
- Weighted sum (needs same dims for halves)
- Learned fusion: small MLP over
[t; g]trained on CTR/CVR - GNNs (GraphSAGE/PinSAGE): richer context, higher serving cost
Serving: Qdrant + HNSW
- Index
z_iwith HNSW (cosine); payload keepsitem_id+ metadata - Millisecond top-K at high recall with good parameters
Ops tips
- Batch ingestion (500–2000) to avoid timeouts
- Monitor ANN recall/latency; reindex if params change
- Keep ID mapping consistent from embedding to index
Should you add a reranker?
Cross-encoder rereads (query, candidate) jointly, often improving NDCG@top-K — but adds latency.
- Budget
K ∈ [20, 100] - Cache texts and consider mixed precision
When it helps / hurts
- On text-only retrieval → small gains (sharper top of list)
- On strong fusion retrieval → can hurt (misaligned with behavior-driven utility, thin descriptions, or “don’t fix what isn’t broken”)
Offline results (highlights)
Retrieval (Recall@10 / NDCG@10 / MAP@10 / Precision@10)
- Text-only:
0.1712 / 0.1963 / 0.0966 / 0.0409 - Graph-only:
0.6877 / 0.6027 / 0.4521 / 0.1967 - Fusion:
0.7120 / 0.6214 / 0.4636 / 0.2010 - Fusion (+clean):
0.7119 / 0.6257 / 0.4639 / 0.1987
Rerank on fusion (top-50)
- Baseline Fusion: Recall@10
0.7312, NDCG@100.4469 - Fusion + Rerank: Recall@10
0.3874, NDCG@100.2068→ degrades
Rerank on text-only
- Baseline: NDCG@10
0.0909 - +Rerank:
0.0988(small lift; still far below graph/fusion)
Engineering playbook
Graph construction
edges_ui = aggregate_user_item(events, weights={"click":1,"purchase":5})
edges_si = from_search_results(search_id, item_id, weight="count")
edges_ii = coview_pairs(search_id→items, cmin=2, weight="log1p")
G = build_graph(nodes=[users,items,searches], edges=[edges_ui,edges_si,edges_ii])
Node2Vec
n2v = Node2Vec(G, p=1.0, q=0.75, walk_length=40, num_walks=10, weighted=True)
g = n2v.fit(dim=dg)
Text encoder
t = bi_encoder.encode(clean(title) + " " + clean(desc), normalize=True)
Fusion & index
z = concat(alpha * t_hat, beta * g_hat)
qdrant.upsert(id=item_id, vector=z, payload=meta)
Dynamic fusion query
if ‖g_hat‖ < τ: vec = concat(t_hat, 0)
else: vec = concat(alpha * t_hat, beta * g_hat)
results = qdrant.search(vec, topK=50)
Optional rerank
if pipeline == "text_only":
scores = cross_encoder.score(query_text, candidate_texts)
results = sort_by(scores)
Diagnostics
- Ego-nets: visualize subgraphs; confirm weights (purchase > click) and coherent clusters
- UMAP on neighbors: compact clusters ≈ better perceived relevance
- Segmented metrics: track Recall@K by item age, category, locale
Production notes
- Monitor latency (P50, P95) of ANN
- Track zero-graph rate; enrich co-views if high
- Apply log-scaling on edges to mitigate popularity bias
- Track multilingual fairness; watch embedding drift
Limitations & roadmap
Current gaps
- No user cold-start (anonymous/new users)
- No recency weighting (all interactions equal)
- Fusion is heuristic, not learned end-to-end
Next
- Temporal dynamics (recency, sessions)
- Learned fusion (MLP over
[t; g]) - Graph neural recommenders (GraphSAGE, PinSAGE)
- Online A/B for CTR/CVR validation
Key takeaways
- Graph rules, text rescues — fusion wins across segments
- Dynamic fusion handles cold-start smoothly
- Search-derived co-views are cheap, effective enrichment
- Rerankers help text-only, can hurt strong fusion
- Modular stack = offline embeddings + ANN + optional rerank
Further Reading
Full technical report:
Advertisement Suggestion System — Methods-First Guide to Graph–Text Fusion with Cold-Start Awareness and Re-Ranking
Report PDF