Claude:
Great idea! You can definitely build a personalized topic recommendation system on top of Discourse’s existing semantic search infrastructure. Here are several approaches and Ruby libraries to achieve this:
Approach 1: User Interest Profile via Topic Embeddings
Concept: Create a user interest vector by aggregating embeddings of topics they’ve viewed, then find similar topics.
# Pseudo-code approach
class UserRecommendationService
def self.recommend_topics(user, limit = 10)
# Get user's viewed topics with recent bias
viewed_topics = user.topic_views
.includes(:topic)
.where('viewed_at > ?', 30.days.ago)
.order('viewed_at DESC')
# Calculate user interest vector (weighted average of topic embeddings)
user_vector = calculate_user_interest_vector(viewed_topics)
# Find similar topics using pgvector
similar_topics = Topic.select("*, embedding <-> '#{user_vector}' as distance")
.where.not(id: viewed_topics.pluck(:topic_id))
.order('distance ASC')
.limit(limit)
similar_topics
end
private
def self.calculate_user_interest_vector(viewed_topics)
# Weight recent views more heavily
weighted_embeddings = viewed_topics.map do |view|
recency_weight = calculate_recency_weight(view.viewed_at)
view_duration_weight = calculate_engagement_weight(view)
embedding = view.topic.embedding
weight = recency_weight * view_duration_weight
embedding.map { |val| val * weight }
end
# Average the weighted embeddings
average_vector(weighted_embeddings)
end
end
Approach 2: Collaborative Filtering + Content-Based
Ruby Libraries:
- ruby-recommender - Simple collaborative filtering
- matrix (built-in) - For similarity calculations
- numo-narray - Efficient numerical arrays
# Using ruby-recommender gem
gem 'ruby-recommender'
class HybridRecommendationService
def self.recommend(user_id, limit = 10)
# Collaborative filtering component
cf_recommendations = collaborative_filtering_recommendations(user_id)
# Content-based component (using existing embeddings)
content_recommendations = content_based_recommendations(user_id)
# Combine both approaches
hybrid_score(cf_recommendations, content_recommendations, limit)
end
private
def self.collaborative_filtering_recommendations(user_id)
# Build user-topic interaction matrix
interactions = build_interaction_matrix
recommender = Recommender::CosineSimilarity.new(interactions)
recommender.recommendations_for(user_id)
end
end
Approach 3: Enhanced with Machine Learning
Ruby ML Libraries:
- rumale - Comprehensive ML library
- ruby-opencv - If you want to process images in topics
- torch-rb - PyTorch bindings for Ruby
# Using Rumale for clustering and recommendations
gem 'rumale'
class MLRecommendationService
include Rumale
def self.recommend_topics(user, limit = 10)
# Get user behavior features
user_features = extract_user_features(user)
# Use clustering to find similar users
similar_users = find_similar_users(user_features)
# Get topics liked by similar users
recommended_topics = get_topics_from_similar_users(similar_users)
# Re-rank using semantic similarity
rerank_with_embeddings(recommended_topics, user, limit)
end
end
Approach 4: Simple Weighted Approach (Quick Start)
For a quick implementation, you can start with this simpler approach:
class SimpleRecommendationService
def self.recommend_topics(user, limit = 10)
return Topic.latest.limit(limit) unless user.persisted?
# Get user's topic categories and tags with weights
user_interests = analyze_user_interests(user)
# Score topics based on user interests
scored_topics = Topic.published
.where.not(id: user.topic_views.pluck(:topic_id))
.map { |topic| [topic, calculate_interest_score(topic, user_interests)] }
.sort_by { |_, score| -score }
.first(limit)
.map(&:first)
scored_topics
end
private
def self.analyze_user_interests(user)
viewed_topics = user.topic_views.includes(:topic).recent
# Weight by categories, tags, and recency
interests = {
categories: Hash.new(0),
tags: Hash.new(0),
users: Hash.new(0)
}
viewed_topics.each do |view|
topic = view.topic
recency_multiplier = recency_weight(view.viewed_at)
interests[:categories][topic.category_id] += recency_multiplier
topic.tags.each { |tag| interests[:tags][tag.name] += recency_multiplier }
interests[:users][topic.user_id] += recency_multiplier * 0.5
end
interests
end
end
Implementation Steps:
-
Start Simple: Begin with the weighted approach using existing Discourse data (categories, tags, user interactions)
-
Add Vector Similarity: Leverage your existing pgvector setup to find semantically similar topics
-
Track More Signals:
# Add these tracking events
- Time spent reading
- Scroll depth
- Likes/reactions
- Replies/engagement
-
A/B Testing: Use Discourse’s existing A/B testing infrastructure to compare recommendation algorithms
Ruby Gems to Consider:
# Essential gems
gem 'matrix' # Built-in, for vector operations
gem 'ruby-recommender' # Simple collaborative filtering
gem 'rumale' # Machine learning toolkit
gem 'numo-narray' # Efficient numerical computing
gem 'redis' # For caching recommendations
# Optional advanced gems
gem 'torch-rb' # Deep learning
gem 'lightgbm' # Gradient boosting
Database Considerations:
Since you already have pgvector, you can extend your schema:
-- Add user interest vectors
ALTER TABLE users ADD COLUMN interest_vector vector(384); -- Match your embedding dimensions
-- Add interaction tracking
CREATE TABLE user_topic_interactions (
user_id INTEGER,
topic_id INTEGER,
interaction_type VARCHAR, -- 'view', 'like', 'reply', etc.
duration INTEGER, -- time spent
created_at TIMESTAMP
);
-- Create indexes for recommendations
CREATE INDEX ON user_topic_interactions (user_id, created_at DESC);
Start with the simple weighted approach, then gradually incorporate the semantic similarity using your existing embeddings infrastructure. This will give you quick wins while building toward a more sophisticated system.