Ongoing Projects on The Split Perspective

Legal Review for First Scenario

Thu, 30 Apr 2026 06:37:13 +0200

Summary Link to heading

I used an AI to generate a scenario that would be permitted under EU AI Act.

My question is: Is the scenario below actually permitted based on this law?

I am including the law and final scenario directly beneath but at the bottom, you can view the prompt (modelled on the one in the paper) and the AI explanation at the bottom.

BDI Technical AI Project

Tue, 14 Apr 2026 16:37:13 +0200

Progress Link to heading

Course Time	Dev Work

^{Dev bar color changes color based on course time progress.}

Summary Link to heading

My particular interest lies in the legal + technical side of AI safety and so I really wanted my project to include both aspects. One of the options for this is to take an existing paper, repeat it and then extend it in some way.

Technical AI Safety - Project Idea

Sat, 11 Apr 2026 16:37:13 +0200

Code and Compliance: Split Perspectives on California Frontier AI Laws Link to heading

I will be doing a review of current California laws relating to Frontier AI companies and models. My focus will be on making this information readable for both policymakers and tech people. The document will be written in such a way that the information will be divided into two sections: one for policymakers and non-technical people, explaining what everything means, why it matters and where the technical gaps are in language friendly to non-technical people and one for the technical people at Frontier AI labs who are looking to better understand how California legislation impacts them and their work.

AGI Strategy - Day 8

Wed, 25 Mar 2026 16:37:13 +0200

Day 8 Detailed Plan: Safety Prompt Design Link to heading

Context from prior days: You have a complete baseline dataset with a toxicity rate of approximately 0.0018%. Today is a research and design day — no large compute runs. The goal is to produce 3-5 well-reasoned safety system prompts, test them qualitatively on a small sample, and select 1-2 candidates for full evaluation on Day 9.

AGI Strategy - Day 8 Hallucinations

Wed, 25 Mar 2026 16:37:13 +0200

What led to this was me asking it to find me the link for Meta’s recommended system prompt for Llama:

** Note:** I filtered out any suggested toxic content so as not to inform. I still have the original unredacted conversation.

My original prompt: Link to heading

Can you help me find Meta’s recommended system prompt?

And here is the full reply: Link to heading

I have verified the official Hugging Face model card for Llama 3.1 8B Instruct.

AGI Strategy - Day 8 Updated

Wed, 25 Mar 2026 16:37:13 +0200

Day 8 Detailed Plan: Safety Prompt Design Link to heading

AGI Strategy - Day 7

Tue, 24 Mar 2026 16:37:13 +0200

Day 7 Detailed Plan: Baseline Analysis Link to heading

Context from prior days: You have a complete, verified baseline results file from Day 6. Today is the first pure analysis day — no generation, no pipeline work. The goal is to characterise your baseline thoroughly enough that you have a clear, documented picture of model behaviour without any safety mitigation. This becomes the reference point for every comparison in Weeks 2 and 3, so the quality of this analysis directly affects the quality of your final writeup.

AGI Strategy - Day 6

Mon, 23 Mar 2026 16:37:13 +0200

Day 6 Detailed Plan: Full Baseline Evaluation Link to heading

Context from prior days: Your pipeline is tested and working, your output schema is finalised, and you have a time-per-prompt estimate from Day 5. Today is primarily an execution day — the main task is running the pipeline over your full subset with no system prompt, producing the baseline results dataset that everything in Weeks 2 and 3 will be measured against. Most of the compute time will be unattended, so this plan accounts for how to use that time productively.

AGI Strategy - Day 5

Fri, 20 Mar 2026 16:37:13 +0200

Day 5 Detailed Plan: End-to-End Pipeline Test Link to heading

Context from prior days: You have a generate_completion function from Day 3 and an evaluate_toxicity function from Day 4, each tested independently. Today you combine them into a single pipeline, run it on 50-100 prompts, and produce your first real baseline metrics. This is also the end of Week 1, so the goal is to leave the day with full confidence that the pipeline is ready to run at scale on Day 6.

AGI Strategy - Day 4

Thu, 19 Mar 2026 16:37:13 +0200

Day 4 Detailed Plan: Toxicity Evaluation Pipeline Link to heading

Context from prior days: You now have a working generation function, documented parameters, and a set of smoke test completions saved in results/smoke_test.jsonl. Day 4 builds a scoring layer on top of those outputs, and together the two components form the complete pipeline you will test end-to-end on Day 5.

Session Structure (2 hours) Link to heading

Block 1 — Choose your toxicity classifier (15 min) Link to heading

Your plan lists Detoxify as the primary option with Perspective API as an alternative. The practical tradeoffs:

AGI Strategy - Day 3

Wed, 18 Mar 2026 16:37:13 +0200

Day 3 Detailed Plan: Baseline Generation Pipeline Link to heading

Context from prior days: By this point you should have a working Python environment with Ollama and Llama 3.1 8B installed (Day 1), and a saved, stratified subset of ToxiGen prompts (Day 2). Day 3 builds directly on both.

Session Structure (2-3 hours) Link to heading

Block 1 — Write the core generation function (45-60 min) Link to heading

The function signature is already sketched in your plan. Flesh it out with the following considerations:

AGI Strategy - Day 2

Tue, 17 Mar 2026 16:37:13 +0200

Detailed Instructions for Day 2 Link to heading

Goal: Load the ToxiGen dataset, understand its structure, and create a balanced “golden set” of data to use for testing later.

Step 1: Environment Setup

You will need the datasets library from Hugging Face.
Run: pip install datasets pandas
(Optional) If you want to see the data in a table format easily, pip install jupyter and use a notebook, or just use standard Python scripts.

Step 2: Load and Inspect (The “Deep Dive”)

AGI Strategy - Day 1

Mon, 16 Mar 2026 16:37:13 +0200

Updated Day 1 for Your 3-Week Plan Link to heading

Here’s your revised Day 1 that incorporates Codeberg setup and a progress tracking system:

Day 1 (Monday): Environment Setup, Codeberg Configuration & Paper Reading Link to heading

Time: 3-4 hours

Tasks: Link to heading

Part A: Codeberg Account Setup (45-60 min) Link to heading

Create Codeberg Account

AGI Strategy - Plan

Sat, 14 Mar 2026 16:37:13 +0200

3-Week Plan: Prompt-Based Mitigation for Toxic Content Link to heading

Goal: Systematically evaluate whether safety-focused system prompts reduce toxic output on ToxiGen benchmark using a local LLM.

Timeline: 3 weeks, Monday-Friday only (15 working days)

Estimated daily time: 2-3 hours/day for Weeks 1-2, 3-4 hours/day for Week 3

Total time investment: ~40-45 hours

Week 1: Setup, Learning, and Infrastructure (Days 1-5) Link to heading

Day 1 (Monday): Environment Setup & Paper Reading Link to heading

UPDATED DAY 1 Link to heading

Time: 2-3 hours

AI Generated Learning Plan

Fri, 13 Mar 2026 17:37:13 +0200

We were given the task of using an LLM to generate a learning plan for our project. This is what Claude Sonnet 4.6 created.

Learning Roadmap: Local LLM Safety Testing Link to heading

Phase	Duration	Key Activities	Success Criteria
1. Environment Setup	1-2 days	Install Ollama or LM Studio; download a model (Llama 3.1 8B or Mistral 7B); verify GPU acceleration working on M2 Max	Successfully run inference locally with acceptable speed (>20 tokens/sec)
2. Baseline Testing	2-3 days	Select/create test prompts from ToxiGen or BBQ; run baseline evaluation; document model responses; establish scoring methodology	Complete 50-100 test prompts with documented baseline scores
3. Intervention Method	3-5 days	Choose mitigation approach (system prompts, fine-tuning, or RAG); implement the intervention; validate it’s working correctly	Intervention successfully applied without breaking model functionality
4. Post-Intervention Testing	1-2 days	Re-run identical test suite; score responses using same methodology; compare results quantitatively	Documented comparison showing measurable change in safety metrics
5. Analysis & Documentation	1-2 days	Analyze what worked/didn’t work; document limitations; identify next steps for deeper investigation	Written report with findings, methodology, and lessons learned

Total estimated time: 8-14 days (assuming part-time effort)

AGI Strategy - Personal Action Plan

Fri, 13 Mar 2026 16:37:13 +0200

TL;DR Link to heading

Summary Link to heading

I will be researching the current prevelance of harmful content (specifics tbd) and reporting on it (phase 1). Following that, I will be setting up a local LLM so that I can test and use the ToxiGen benchmark (phase 2). After that has been setup, I will research known methods to reduce harmful content, implement that in the LLM environment and retest. Following this, I will write reports to summarize the results and learnings that happened during the two phases of this project.

Mon, 01 Jan 0001 00:00:00 +0000

Working Directory Structure Link to heading

`eu-ai-benchmark/` Link to heading

`data/` Link to heading

raw/
- eu_ai_act.xml — Source from EUR-Lex
- gdpr.xml — Source from EUR-Lex
clauses/
- eu_ai_act_clauses.json — Parsed atomic clauses
- gdpr_clauses.json
- combined_clauses.json — Merged, deduplicated
scenarios/
- scenarios_raw.json — Direct LLM output
- scenarios_reviewed.json — After human QA
embeddings/
- clauses.index — Chroma or FAISS vector store

`src/` Link to heading

parse/
- parse_eu_ai_act.py
- parse_gdpr.py
- combine_clauses.py
scenarios/
- generate_scenarios.py — Batched API calls to generate scenarios
- review_scenarios.py — CLI tool for human QA pass
benchmark/
- runner.py — Sends scenarios to models, logs responses
- evaluator.py — LLM-as-judge + RAG evaluation logic
- retriever.py — Chroma/FAISS retrieval wrapper
- aggregate.py — Rolls up scores to per-model/per-clause stats
utils/
- api_client.py — Unified wrapper: Ollama + Anthropic/OpenAI
- logger.py — Structured logging shared across modules

`models/` Link to heading

registry.yaml — Model inventory: name, type, path/endpoint
system_prompts/
- llama3_safety.txt — System prompt for safety variant
- mistral_safety.txt
adapters/ — LoRA weights if fine-tuning is used
- llama3_safety_lora/
- mistral_safety_lora/

`results/` Link to heading

raw/
- run_YYYYMMDD_HHMMSS/ — One folder per benchmark run
  - llama3_8b.jsonl
  - mistral_7b.jsonl
  - gemma2_9b.jsonl
  - qwen2_7b.jsonl
  - llama3_8b_safety.jsonl
  - mistral_7b_safety.jsonl
evaluated/
- run_YYYYMMDD_HHMMSS/ — Mirrors raw run folder
  - scores.jsonl — Per-response scores + reasoning
  - summary.json — Aggregate stats for this run
final/
- combined_scores.csv — All runs merged, for write-up
- charts/ — Generated figures

`notebooks/` Link to heading

explore_clauses.ipynb — Sanity-check parsed data
explore_scenarios.ipynb
analyse_results.ipynb — Produce charts and tables for write-up

`writeup/` Link to heading

draft.md — Main document
references.bib
figures/ — Copies of charts used in document

`config/` Link to heading

settings.yaml — Paths, model names, API targets, batch sizes
rubric.yaml — Evaluator scoring rubric (shared by all runs)

`tests/` Link to heading

test_parser.py
test_runner.py
test_evaluator.py

Root files Link to heading

.env — API keys — never commit
.gitignore
requirements.txt
README.md

Ongoing Projects on The Split Perspective

Legal Review for First Scenario

Summary Link to heading

BDI Technical AI Project

Progress Link to heading

Summary Link to heading

Technical AI Safety - Project Idea

Code and Compliance: Split Perspectives on California Frontier AI Laws Link to heading

AGI Strategy - Day 8

Day 8 Detailed Plan: Safety Prompt Design Link to heading

AGI Strategy - Day 8 Hallucinations

My original prompt: Link to heading

And here is the full reply: Link to heading

AGI Strategy - Day 8 Updated

Day 8 Detailed Plan: Safety Prompt Design Link to heading

AGI Strategy - Day 7

Day 7 Detailed Plan: Baseline Analysis Link to heading

AGI Strategy - Day 6

Day 6 Detailed Plan: Full Baseline Evaluation Link to heading

AGI Strategy - Day 5

Day 5 Detailed Plan: End-to-End Pipeline Test Link to heading

AGI Strategy - Day 4

Day 4 Detailed Plan: Toxicity Evaluation Pipeline Link to heading

Session Structure (2 hours) Link to heading

Block 1 — Choose your toxicity classifier (15 min) Link to heading

AGI Strategy - Day 3

Day 3 Detailed Plan: Baseline Generation Pipeline Link to heading

Session Structure (2-3 hours) Link to heading

Block 1 — Write the core generation function (45-60 min) Link to heading

AGI Strategy - Day 2

Detailed Instructions for Day 2 Link to heading

AGI Strategy - Day 1

Updated Day 1 for Your 3-Week Plan Link to heading

Day 1 (Monday): Environment Setup, Codeberg Configuration & Paper Reading Link to heading

Tasks: Link to heading

Part A: Codeberg Account Setup (45-60 min) Link to heading

AGI Strategy - Plan

3-Week Plan: Prompt-Based Mitigation for Toxic Content Link to heading

Week 1: Setup, Learning, and Infrastructure (Days 1-5) Link to heading

Day 1 (Monday): Environment Setup & Paper Reading Link to heading

UPDATED DAY 1 Link to heading

AI Generated Learning Plan

Learning Roadmap: Local LLM Safety Testing Link to heading

AGI Strategy - Personal Action Plan

TL;DR Link to heading

Summary Link to heading

Working Directory Structure Link to heading

eu-ai-benchmark/ Link to heading

data/ Link to heading

src/ Link to heading

models/ Link to heading

results/ Link to heading

notebooks/ Link to heading

writeup/ Link to heading

config/ Link to heading

tests/ Link to heading

Root files Link to heading

`eu-ai-benchmark/` Link to heading

`data/` Link to heading

`src/` Link to heading

`models/` Link to heading

`results/` Link to heading

`notebooks/` Link to heading

`writeup/` Link to heading

`config/` Link to heading

`tests/` Link to heading