Gradio

v0.5.0 | Powered by the Inverse Constitutional AI (ICAI) pipeline

🌟 Star on GitHub | ✍️ Report bug | 📮 Get in touch

🤖 Example 1: Compare GPT-4o's personality to other models 📚 Example 2: Personality traits encouraged by feedback datasets 📝 Example 3: Preferred personality traits across writing tasks

Configuration

💽 Dataset selection

🏟️ LMArena (Llama4 special)
- Llama-4-Maverick-03-26-Experimental arena results, combined with public weights version of Llama-4-Maverick. (Note: legacy annotations - used older annotation pipeline version)
- Source: https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03-26-Experimental_battles/tree/main/data
🏟️ LMArena (2024)
- 10k subsample of LMArena text dataset (100k) released alongside Arena Explorer work, crowdsourced human annotations from between June and August 2024 in English, including topic labels automatically generated by Arena Explorer pipeline. LMArena is also known as Chatbot Arena. (Note: cross-annotated in 3 runs)
- Source: https://huggingface.co/datasets/lmarena-ai/arena-human-preference-100k
🦙 AlpacaEval
- 648 cross-annotated human preference pairs used to validate AlpacaEval annotators. (Note: legacy annotations - used older annotation pipeline version)
- Source: https://huggingface.co/datasets/tatsu-lab/alpaca_eval/
🕊️ Anthropic harmless
- 5k subsample of human preference pairs favouring harmless responses from RLHF dataset by Anthropic. (Note: legacy annotations - used older annotation pipeline version)
- Source: https://github.com/anthropics/hh-rlhf
🚑 Anthropic helpful
- 5k subsample of human preference pairs favouring helpful responses from RLHF dataset by Anthropic. (Note: legacy annotations - used older annotation pipeline version)
- Source: https://github.com/anthropics/hh-rlhf
💎 PRISM
- ~8k human preference pairs from PRISM dataset, focused on controversial topics with extensive annotator information. Originally four-way annotations, subsampled using 1-of-3 rejected responses to get pairwise preferences. (Note: legacy annotations - used older annotation pipeline version)
- Source: https://huggingface.co/datasets/HannahRoseKirk/prism-alignment
🏋️ OLMo-2 0325 pref-mix
- 10k preference pairs subsampled randomly from original 378k pairs used for fine-tuning OLMo 2 model by Ai2. Synthetically generated via multiple different pipelines. (Note: legacy annotations - used older annotation pipeline version)
- Source: https://huggingface.co/datasets/allenai/olmo-2-0325-32b-preference-mix
🔄 MultiPref
- 10k preference pairs, each annotated by 4 human annotators as well as GPT-4-based AI annotators. Whilst each pair is annotated by 4 human annotators, these annotators are not identical across all pairs (i.e. more than four annotators overall worked on the dataset). (Note: legacy annotations - used older annotation pipeline version)
- Source: https://huggingface.co/datasets/allenai/multipref
🎭 Model Personality Comparison
- Model Personality Comparison dataset between openrouter/openai/gpt-4o-2024-11-20, openrouter/openai/gpt-4.1-mini, openrouter/x-ai/grok-4, openrouter/google/gemini-2.5-pro, openrouter/moonshotai/kimi-k2, openrouter/meta-llama/llama-4-maverick, openrouter/mistralai/magistral-medium-2506, openrouter/anthropic/claude-sonnet-4, openrouter/openai/gpt-oss-20b, openrouter/openai/gpt-5, openrouter/mistralai/mistral-medium-3.1, openrouter/anthropic/claude-sonnet-4.5, openrouter/z-ai/glm-4.6, openrouter/anthropic/claude-haiku-4.5, openrouter/google/gemini-3-pro-preview, openrouter/openai/gpt-5.1, openrouter/openai/gpt-5.1-chat, openrouter/allenai/olmo-3-32b-think, openrouter/allenai/olmo-3-7b-instruct, openrouter/allenai/olmo-3-7b-think, openrouter/mistralai/mistral-large-2512, openrouter/google/gemini-3.1-pro-preview, openrouter/mistralai/mistral-medium-3-5, openrouter/anthropic/claude-opus-4.7, openrouter/openai/gpt-5.3-chat, openrouter/anthropic/claude-3.7-sonnet, openrouter/openai/gpt-5-chat, openrouter/openai/gpt-3.5-turbo. Using openrouter/openai/gpt-4o-2024-11-20 as reference model(s). Created and annotated using Feedback Forensics, see https://huggingface.co/datasets/rdnfn/ff-model-personality for more details.
- Source: Unknown source

🔎 Analysis mode

👥 Human/AI feedback analysis 🤖 Model analysis 🔧 Advanced settings

⚠️ Some configuration options (grouping by column, selecting multiple col annotators) only work correctly when selecting a single dataset. Select a single dataset to use these features.

📌 Select models to compare

Select model(s) to investigate in terms of personality traits.

🧭 Select reference models

Select reference models to compare selected models to. Metrics can be interpreted as how much the selected model(s) exhibit(s) personality traits relative to the reference model(s). If none are selected, all available models will be used as references. Example: With GPT-4o as reference model, personality traits of selected models are computed relative to GPT-4o, i.e. only using datapoints directly comparing selected models with GPT-4o.

🗂️ Select feedback annotations to compare (AI or human)

Analyse personality traits encouraged by different pairwise feedback annotations

👥→ Annotator columns

Select the annotators to be included as a column in the results table. By default only a single (ground-truth) annotator is included.

👥↓ Annotator rows

Select the annotators to be included as a row in the results table. By default only objective-following AI annotators are included (named as "AI: <OBJECTIVE>").

Results

🎛️ View

📊 Numerical overview 🔎 Datapoint viewer

🔗 Share link

Numerical overview

Overall statistics

See guide here for metric details

Annotation metrics

👉 Click on values to view example datapoints | See guide here to learn how each metric is computed and can be interpreted

Metric

Sort by

Sort order

Datapoint viewer

Controls

👥 Annotator 1

👥 Annotator 2

🔍 Filter subset

📋 Example index

0 100

Datapoint

Feedback Forensics app v0.5.0