内容推荐试验

shared_user · September 3, 2025, 10:30am

你站试验在首页 For You Filter 中展示根据你的阅读历史推荐的话题

2025-09-03T10:31:11Z
目前采用的方法：

Generate cache for all users manually using rake 'recom:users'
show this ‘For You’-alike filter at /hot filter, shadowing existing /hot
Recommendation method is very simple: sort viewed topics by total_msecs_viewed,
get their related topics using embedding, andaggregate them together

显然目前的方法很弱智，需要有人提出一个更好的方法。

Discourse 相关文档：

Collaborative Filtering:

关于本站公开数据提供 SQLite 下载的讨论

网站

plugin_store_rows 部分插件会把用户数据存这个表里，与插件有关 post_search_data search_log 会记录用户的搜索关键词记录 push_subscriptions 会记录用户推送的相关信息（可能包含认证信息） topic_users 包含用户的书签、话题阅读时间和进度 post_timings 最重量级的一个表，记录了每个用户阅读每个 post 的时间 topic_link_clicks 记录了用户点击链接的记录 posts、topics、users 这几个表用白名单过滤一下列，感觉问题不大。另外有可能会导出被删除的内容，如果介意的话可以过滤一下 deleted_at。 likes 是从 post_actions 里过滤出来的吗，水源没有这个表

一只玉米人 · September 3, 2025, 10:43am

交大門搜广推项目 @tension

shared_user · September 4, 2025, 12:23am

Claude:

Great idea! You can definitely build a personalized topic recommendation system on top of Discourse’s existing semantic search infrastructure. Here are several approaches and Ruby libraries to achieve this:

Approach 1: User Interest Profile via Topic Embeddings

Concept: Create a user interest vector by aggregating embeddings of topics they’ve viewed, then find similar topics.

# Pseudo-code approach
class UserRecommendationService
  def self.recommend_topics(user, limit = 10)
    # Get user's viewed topics with recent bias
    viewed_topics = user.topic_views
                       .includes(:topic)
                       .where('viewed_at > ?', 30.days.ago)
                       .order('viewed_at DESC')
    
    # Calculate user interest vector (weighted average of topic embeddings)
    user_vector = calculate_user_interest_vector(viewed_topics)
    
    # Find similar topics using pgvector
    similar_topics = Topic.select("*, embedding <-> '#{user_vector}' as distance")
                          .where.not(id: viewed_topics.pluck(:topic_id))
                          .order('distance ASC')
                          .limit(limit)
    
    similar_topics
  end
  
  private
  
  def self.calculate_user_interest_vector(viewed_topics)
    # Weight recent views more heavily
    weighted_embeddings = viewed_topics.map do |view|
      recency_weight = calculate_recency_weight(view.viewed_at)
      view_duration_weight = calculate_engagement_weight(view)
      
      embedding = view.topic.embedding
      weight = recency_weight * view_duration_weight
      
      embedding.map { |val| val * weight }
    end
    
    # Average the weighted embeddings
    average_vector(weighted_embeddings)
  end
end

Approach 2: Collaborative Filtering + Content-Based

Ruby Libraries:

ruby-recommender - Simple collaborative filtering
matrix (built-in) - For similarity calculations
numo-narray - Efficient numerical arrays

# Using ruby-recommender gem
gem 'ruby-recommender'

class HybridRecommendationService
  def self.recommend(user_id, limit = 10)
    # Collaborative filtering component
    cf_recommendations = collaborative_filtering_recommendations(user_id)
    
    # Content-based component (using existing embeddings)
    content_recommendations = content_based_recommendations(user_id)
    
    # Combine both approaches
    hybrid_score(cf_recommendations, content_recommendations, limit)
  end
  
  private
  
  def self.collaborative_filtering_recommendations(user_id)
    # Build user-topic interaction matrix
    interactions = build_interaction_matrix
    
    recommender = Recommender::CosineSimilarity.new(interactions)
    recommender.recommendations_for(user_id)
  end
end

Approach 3: Enhanced with Machine Learning

Ruby ML Libraries:

rumale - Comprehensive ML library
ruby-opencv - If you want to process images in topics
torch-rb - PyTorch bindings for Ruby

# Using Rumale for clustering and recommendations
gem 'rumale'

class MLRecommendationService
  include Rumale
  
  def self.recommend_topics(user, limit = 10)
    # Get user behavior features
    user_features = extract_user_features(user)
    
    # Use clustering to find similar users
    similar_users = find_similar_users(user_features)
    
    # Get topics liked by similar users
    recommended_topics = get_topics_from_similar_users(similar_users)
    
    # Re-rank using semantic similarity
    rerank_with_embeddings(recommended_topics, user, limit)
  end
end

Approach 4: Simple Weighted Approach (Quick Start)

For a quick implementation, you can start with this simpler approach:

class SimpleRecommendationService
  def self.recommend_topics(user, limit = 10)
    return Topic.latest.limit(limit) unless user.persisted?
    
    # Get user's topic categories and tags with weights
    user_interests = analyze_user_interests(user)
    
    # Score topics based on user interests
    scored_topics = Topic.published
                         .where.not(id: user.topic_views.pluck(:topic_id))
                         .map { |topic| [topic, calculate_interest_score(topic, user_interests)] }
                         .sort_by { |_, score| -score }
                         .first(limit)
                         .map(&:first)
    
    scored_topics
  end
  
  private
  
  def self.analyze_user_interests(user)
    viewed_topics = user.topic_views.includes(:topic).recent
    
    # Weight by categories, tags, and recency
    interests = {
      categories: Hash.new(0),
      tags: Hash.new(0),
      users: Hash.new(0)
    }
    
    viewed_topics.each do |view|
      topic = view.topic
      recency_multiplier = recency_weight(view.viewed_at)
      
      interests[:categories][topic.category_id] += recency_multiplier
      topic.tags.each { |tag| interests[:tags][tag.name] += recency_multiplier }
      interests[:users][topic.user_id] += recency_multiplier * 0.5
    end
    
    interests
  end
end

Implementation Steps:

Start Simple: Begin with the weighted approach using existing Discourse data (categories, tags, user interactions)
Add Vector Similarity: Leverage your existing pgvector setup to find semantically similar topics

Track More Signals:

# Add these tracking events
- Time spent reading
- Scroll depth
- Likes/reactions
- Replies/engagement

A/B Testing: Use Discourse’s existing A/B testing infrastructure to compare recommendation algorithms

Ruby Gems to Consider:

# Essential gems
gem 'matrix'                    # Built-in, for vector operations
gem 'ruby-recommender'         # Simple collaborative filtering
gem 'rumale'                   # Machine learning toolkit
gem 'numo-narray'             # Efficient numerical computing
gem 'redis'                    # For caching recommendations

# Optional advanced gems
gem 'torch-rb'                # Deep learning
gem 'lightgbm'               # Gradient boosting

Database Considerations:

Since you already have pgvector, you can extend your schema:

-- Add user interest vectors
ALTER TABLE users ADD COLUMN interest_vector vector(384); -- Match your embedding dimensions

-- Add interaction tracking
CREATE TABLE user_topic_interactions (
  user_id INTEGER,
  topic_id INTEGER,
  interaction_type VARCHAR, -- 'view', 'like', 'reply', etc.
  duration INTEGER,         -- time spent
  created_at TIMESTAMP
);

-- Create indexes for recommendations
CREATE INDEX ON user_topic_interactions (user_id, created_at DESC);

Start with the simple weighted approach, then gradually incorporate the semantic similarity using your existing embeddings infrastructure. This will give you quick wins while building toward a more sophisticated system.

Topic		Replies	Views
AI 推荐话题网站	17	43	August 6, 2025
土源趣图楼 1.0 谈笑风生趣图楼	74	472	September 12, 2025
互联网社交媒体的末日谈笑风生	2	41	September 15, 2025
驻村记录日志 / 部落格	185	1774	September 9, 2025
大家用什么记笔记谈笑风生	8	121	December 17, 2024
试验：实习 admin / tech advisor / auditor 招募网站 experiment	28	223	May 17, 2024
delusion.search(happyEnd) 日志 / 部落格	830	6479	September 7, 2025
风险分摊的困境：概率人看惠民保、百万医疗深入交流	4	59	October 21, 2024
记录 \| 智慧树下树莓果谈笑风生日记	188	732	September 10, 2025
选择 Cursor,Jetbrains 还是 Windsurf? 谈笑风生	2	73	August 7, 2025