内容推荐试验

你站试验在首页 For You Filter 中展示根据你的阅读历史推荐的话题

image

2025-09-03T10:31:11Z
目前采用的方法:

  • Generate cache for all users manually using rake 'recom:users'
  • show this ‘For You’-alike filter at /hot filter, shadowing existing /hot
  • Recommendation method is very simple: sort viewed topics by total_msecs_viewed,
    get their related topics using embedding, andaggregate them together

显然目前的方法很弱智,需要有人提出一个更好的方法。

Discourse 相关文档:

Collaborative Filtering:

5 Likes

交大門搜广推项目 @tension

2 Likes

Claude:

Great idea! You can definitely build a personalized topic recommendation system on top of Discourse’s existing semantic search infrastructure. Here are several approaches and Ruby libraries to achieve this:

Approach 1: User Interest Profile via Topic Embeddings

Concept: Create a user interest vector by aggregating embeddings of topics they’ve viewed, then find similar topics.

# Pseudo-code approach
class UserRecommendationService
  def self.recommend_topics(user, limit = 10)
    # Get user's viewed topics with recent bias
    viewed_topics = user.topic_views
                       .includes(:topic)
                       .where('viewed_at > ?', 30.days.ago)
                       .order('viewed_at DESC')
    
    # Calculate user interest vector (weighted average of topic embeddings)
    user_vector = calculate_user_interest_vector(viewed_topics)
    
    # Find similar topics using pgvector
    similar_topics = Topic.select("*, embedding <-> '#{user_vector}' as distance")
                          .where.not(id: viewed_topics.pluck(:topic_id))
                          .order('distance ASC')
                          .limit(limit)
    
    similar_topics
  end
  
  private
  
  def self.calculate_user_interest_vector(viewed_topics)
    # Weight recent views more heavily
    weighted_embeddings = viewed_topics.map do |view|
      recency_weight = calculate_recency_weight(view.viewed_at)
      view_duration_weight = calculate_engagement_weight(view)
      
      embedding = view.topic.embedding
      weight = recency_weight * view_duration_weight
      
      embedding.map { |val| val * weight }
    end
    
    # Average the weighted embeddings
    average_vector(weighted_embeddings)
  end
end

Approach 2: Collaborative Filtering + Content-Based

Ruby Libraries:

  • ruby-recommender - Simple collaborative filtering
  • matrix (built-in) - For similarity calculations
  • numo-narray - Efficient numerical arrays
# Using ruby-recommender gem
gem 'ruby-recommender'

class HybridRecommendationService
  def self.recommend(user_id, limit = 10)
    # Collaborative filtering component
    cf_recommendations = collaborative_filtering_recommendations(user_id)
    
    # Content-based component (using existing embeddings)
    content_recommendations = content_based_recommendations(user_id)
    
    # Combine both approaches
    hybrid_score(cf_recommendations, content_recommendations, limit)
  end
  
  private
  
  def self.collaborative_filtering_recommendations(user_id)
    # Build user-topic interaction matrix
    interactions = build_interaction_matrix
    
    recommender = Recommender::CosineSimilarity.new(interactions)
    recommender.recommendations_for(user_id)
  end
end

Approach 3: Enhanced with Machine Learning

Ruby ML Libraries:

  • rumale - Comprehensive ML library
  • ruby-opencv - If you want to process images in topics
  • torch-rb - PyTorch bindings for Ruby
# Using Rumale for clustering and recommendations
gem 'rumale'

class MLRecommendationService
  include Rumale
  
  def self.recommend_topics(user, limit = 10)
    # Get user behavior features
    user_features = extract_user_features(user)
    
    # Use clustering to find similar users
    similar_users = find_similar_users(user_features)
    
    # Get topics liked by similar users
    recommended_topics = get_topics_from_similar_users(similar_users)
    
    # Re-rank using semantic similarity
    rerank_with_embeddings(recommended_topics, user, limit)
  end
end

Approach 4: Simple Weighted Approach (Quick Start)

For a quick implementation, you can start with this simpler approach:

class SimpleRecommendationService
  def self.recommend_topics(user, limit = 10)
    return Topic.latest.limit(limit) unless user.persisted?
    
    # Get user's topic categories and tags with weights
    user_interests = analyze_user_interests(user)
    
    # Score topics based on user interests
    scored_topics = Topic.published
                         .where.not(id: user.topic_views.pluck(:topic_id))
                         .map { |topic| [topic, calculate_interest_score(topic, user_interests)] }
                         .sort_by { |_, score| -score }
                         .first(limit)
                         .map(&:first)
    
    scored_topics
  end
  
  private
  
  def self.analyze_user_interests(user)
    viewed_topics = user.topic_views.includes(:topic).recent
    
    # Weight by categories, tags, and recency
    interests = {
      categories: Hash.new(0),
      tags: Hash.new(0),
      users: Hash.new(0)
    }
    
    viewed_topics.each do |view|
      topic = view.topic
      recency_multiplier = recency_weight(view.viewed_at)
      
      interests[:categories][topic.category_id] += recency_multiplier
      topic.tags.each { |tag| interests[:tags][tag.name] += recency_multiplier }
      interests[:users][topic.user_id] += recency_multiplier * 0.5
    end
    
    interests
  end
end

Implementation Steps:

  1. Start Simple: Begin with the weighted approach using existing Discourse data (categories, tags, user interactions)

  2. Add Vector Similarity: Leverage your existing pgvector setup to find semantically similar topics

  3. Track More Signals:

    # Add these tracking events
    - Time spent reading
    - Scroll depth
    - Likes/reactions
    - Replies/engagement
    
  4. A/B Testing: Use Discourse’s existing A/B testing infrastructure to compare recommendation algorithms

Ruby Gems to Consider:

# Essential gems
gem 'matrix'                    # Built-in, for vector operations
gem 'ruby-recommender'         # Simple collaborative filtering
gem 'rumale'                   # Machine learning toolkit
gem 'numo-narray'             # Efficient numerical computing
gem 'redis'                    # For caching recommendations

# Optional advanced gems
gem 'torch-rb'                # Deep learning
gem 'lightgbm'               # Gradient boosting

Database Considerations:

Since you already have pgvector, you can extend your schema:

-- Add user interest vectors
ALTER TABLE users ADD COLUMN interest_vector vector(384); -- Match your embedding dimensions

-- Add interaction tracking
CREATE TABLE user_topic_interactions (
  user_id INTEGER,
  topic_id INTEGER,
  interaction_type VARCHAR, -- 'view', 'like', 'reply', etc.
  duration INTEGER,         -- time spent
  created_at TIMESTAMP
);

-- Create indexes for recommendations
CREATE INDEX ON user_topic_interactions (user_id, created_at DESC);

Start with the simple weighted approach, then gradually incorporate the semantic similarity using your existing embeddings infrastructure. This will give you quick wins while building toward a more sophisticated system.

4 Likes