<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ongoing Projects on The Split Perspective</title>
    <link>https://ochotnicka.eu/research/</link>
    <description>Recent content in Ongoing Projects on The Split Perspective</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 30 Apr 2026 06:37:13 +0200</lastBuildDate>
    <atom:link href="https://ochotnicka.eu/research/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Legal Review for First Scenario</title>
      <link>https://ochotnicka.eu/research/legal_review/</link>
      <pubDate>Thu, 30 Apr 2026 06:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/legal_review/</guid>
      <description>&lt;h1 id=&#34;summary&#34;&gt;&#xA;  Summary&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#summary&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;I used an AI to generate a scenario that would be permitted under EU AI Act.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;My question is: Is the scenario below actually permitted based on this law?&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;p&gt;I am including the law and final scenario directly beneath but at the bottom,&#xA;you can view the prompt (modelled on the one in the paper) and the AI&#xA;explanation at the bottom.&lt;/p&gt;</description>
    </item>
    <item>
      <title>BDI Technical AI Project</title>
      <link>https://ochotnicka.eu/research/bdi_tech_ai_project/</link>
      <pubDate>Tue, 14 Apr 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/bdi_tech_ai_project/</guid>
      <description>&lt;h1 id=&#34;progress&#34;&gt;&#xA;  Progress&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#progress&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;&lt;strong&gt;Course Time&lt;/strong&gt;&lt;/th&gt;&#xA;          &lt;th&gt;&lt;strong&gt;Dev Work&lt;/strong&gt;&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;img src=&#34;https://geps.dev/progress/20?barColor=4472C4&#34; alt=&#34;&#34;&gt;&lt;/td&gt;&#xA;          &lt;td&gt;&lt;img src=&#34;https://geps.dev/progress/30?barColor=006600&#34; alt=&#34;&#34;&gt;&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;&lt;sup&gt;Dev bar color changes color based on course time progress.&lt;/sup&gt;&lt;/p&gt;&#xA;&lt;h1 id=&#34;summary&#34;&gt;&#xA;  Summary&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#summary&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;My particular interest lies in the legal + technical side of AI safety and so&#xA;I really wanted my project to include both aspects.  One of the options for&#xA;this is to take an existing paper, repeat it and then extend it in some way.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Technical AI Safety - Project Idea</title>
      <link>https://ochotnicka.eu/research/bdi_tech_ai_safety/</link>
      <pubDate>Sat, 11 Apr 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/bdi_tech_ai_safety/</guid>
      <description>&lt;h1 id=&#34;code-and-compliance-split-perspectives-on-california-frontier-ai-laws&#34;&gt;&#xA;  Code and Compliance: Split Perspectives on California Frontier AI Laws&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#code-and-compliance-split-perspectives-on-california-frontier-ai-laws&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;I will be doing a review of current California laws relating to Frontier AI companies and&#xA;models. My focus will be on making this information readable for both policymakers and&#xA;tech people. The document will be written in such a way that the information will be&#xA;divided into two sections: one for policymakers and non-technical people, explaining&#xA;what everything means, why it matters and where the technical gaps are in language&#xA;friendly to non-technical people and one for the technical people at Frontier AI labs who&#xA;are looking to better understand how California legislation impacts them and their work.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 8</title>
      <link>https://ochotnicka.eu/research/day8/</link>
      <pubDate>Wed, 25 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day8/</guid>
      <description>&lt;h2 id=&#34;day-8-detailed-plan-safety-prompt-design&#34;&gt;&#xA;  Day 8 Detailed Plan: Safety Prompt Design&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-8-detailed-plan-safety-prompt-design&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: You have a complete baseline dataset with a toxicity rate of approximately 0.0018%. Today is a research and design day — no large compute runs. The goal is to produce 3-5 well-reasoned safety system prompts, test them qualitatively on a small sample, and select 1-2 candidates for full evaluation on Day 9.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 8 Hallucinations</title>
      <link>https://ochotnicka.eu/research/day8_hallucination/</link>
      <pubDate>Wed, 25 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day8_hallucination/</guid>
      <description>&lt;p&gt;What led to this was me asking it to find me the link for Meta&amp;rsquo;s recommended system prompt for Llama:&lt;/p&gt;&#xA;&lt;p&gt;** Note:** I filtered out any suggested toxic content so as not to inform.  I still have the original&#xA;unredacted conversation.&lt;/p&gt;&#xA;&lt;h1 id=&#34;my-original-prompt&#34;&gt;&#xA;  My original prompt:&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#my-original-prompt&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;Can you help me find Meta&amp;rsquo;s recommended system prompt?&lt;/p&gt;&#xA;&lt;h1 id=&#34;and-here-is-the-full-reply&#34;&gt;&#xA;  And here is the full reply:&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#and-here-is-the-full-reply&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;I have verified the official Hugging Face model card for &lt;strong&gt;Llama 3.1 8B Instruct&lt;/strong&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 8 Updated</title>
      <link>https://ochotnicka.eu/research/day8_orig/</link>
      <pubDate>Wed, 25 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day8_orig/</guid>
      <description>&lt;h2 id=&#34;day-8-detailed-plan-safety-prompt-design&#34;&gt;&#xA;  Day 8 Detailed Plan: Safety Prompt Design&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-8-detailed-plan-safety-prompt-design&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: You have a complete baseline dataset with a toxicity rate of approximately 0.0018%. Today is a research and design day — no large compute runs. The goal is to produce 3-5 well-reasoned safety system prompts, test them qualitatively on a small sample, and select 1-2 candidates for full evaluation on Day 9.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 7</title>
      <link>https://ochotnicka.eu/research/day7/</link>
      <pubDate>Tue, 24 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day7/</guid>
      <description>&lt;h2 id=&#34;day-7-detailed-plan-baseline-analysis&#34;&gt;&#xA;  Day 7 Detailed Plan: Baseline Analysis&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-7-detailed-plan-baseline-analysis&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: You have a complete, verified baseline results file from Day 6. Today is the first pure analysis day — no generation, no pipeline work. The goal is to characterise your baseline thoroughly enough that you have a clear, documented picture of model behaviour without any safety mitigation. This becomes the reference point for every comparison in Weeks 2 and 3, so the quality of this analysis directly affects the quality of your final writeup.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 6</title>
      <link>https://ochotnicka.eu/research/day6/</link>
      <pubDate>Mon, 23 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day6/</guid>
      <description>&lt;h2 id=&#34;day-6-detailed-plan-full-baseline-evaluation&#34;&gt;&#xA;  Day 6 Detailed Plan: Full Baseline Evaluation&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-6-detailed-plan-full-baseline-evaluation&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: Your pipeline is tested and working, your output schema is finalised, and you have a time-per-prompt estimate from Day 5. Today is primarily an execution day — the main task is running the pipeline over your full subset with no system prompt, producing the baseline results dataset that everything in Weeks 2 and 3 will be measured against. Most of the compute time will be unattended, so this plan accounts for how to use that time productively.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 5</title>
      <link>https://ochotnicka.eu/research/day5/</link>
      <pubDate>Fri, 20 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day5/</guid>
      <description>&lt;h2 id=&#34;day-5-detailed-plan-end-to-end-pipeline-test&#34;&gt;&#xA;  Day 5 Detailed Plan: End-to-End Pipeline Test&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-5-detailed-plan-end-to-end-pipeline-test&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: You have a &lt;code&gt;generate_completion&lt;/code&gt; function from Day 3 and an &lt;code&gt;evaluate_toxicity&lt;/code&gt; function from Day 4, each tested independently. Today you combine them into a single pipeline, run it on 50-100 prompts, and produce your first real baseline metrics. This is also the end of Week 1, so the goal is to leave the day with full confidence that the pipeline is ready to run at scale on Day 6.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 4</title>
      <link>https://ochotnicka.eu/research/day4/</link>
      <pubDate>Thu, 19 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day4/</guid>
      <description>&lt;h2 id=&#34;day-4-detailed-plan-toxicity-evaluation-pipeline&#34;&gt;&#xA;  Day 4 Detailed Plan: Toxicity Evaluation Pipeline&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-4-detailed-plan-toxicity-evaluation-pipeline&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: You now have a working generation function, documented parameters, and a set of smoke test completions saved in &lt;code&gt;results/smoke_test.jsonl&lt;/code&gt;. Day 4 builds a scoring layer on top of those outputs, and together the two components form the complete pipeline you will test end-to-end on Day 5.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;session-structure-2-hours&#34;&gt;&#xA;  Session Structure (2 hours)&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#session-structure-2-hours&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;hr&gt;&#xA;&lt;h4 id=&#34;block-1--choose-your-toxicity-classifier-15-min&#34;&gt;&#xA;  Block 1 — Choose your toxicity classifier (15 min)&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#block-1--choose-your-toxicity-classifier-15-min&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h4&gt;&#xA;&lt;p&gt;Your plan lists Detoxify as the primary option with Perspective API as an alternative. The practical tradeoffs:&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 3</title>
      <link>https://ochotnicka.eu/research/day3/</link>
      <pubDate>Wed, 18 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day3/</guid>
      <description>&lt;h2 id=&#34;day-3-detailed-plan-baseline-generation-pipeline&#34;&gt;&#xA;  Day 3 Detailed Plan: Baseline Generation Pipeline&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-3-detailed-plan-baseline-generation-pipeline&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Context from prior days&lt;/strong&gt;: By this point you should have a working Python environment with Ollama and Llama 3.1 8B installed (Day 1), and a saved, stratified subset of ToxiGen prompts (Day 2). Day 3 builds directly on both.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;session-structure-2-3-hours&#34;&gt;&#xA;  Session Structure (2-3 hours)&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#session-structure-2-3-hours&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;hr&gt;&#xA;&lt;h4 id=&#34;block-1--write-the-core-generation-function-45-60-min&#34;&gt;&#xA;  Block 1 — Write the core generation function (45-60 min)&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#block-1--write-the-core-generation-function-45-60-min&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h4&gt;&#xA;&lt;p&gt;The function signature is already sketched in your plan. Flesh it out with the following considerations:&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 2</title>
      <link>https://ochotnicka.eu/research/day2/</link>
      <pubDate>Tue, 17 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day2/</guid>
      <description>&lt;h3 id=&#34;detailed-instructions-for-day-2&#34;&gt;&#xA;  Detailed Instructions for Day 2&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#detailed-instructions-for-day-2&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Load the ToxiGen dataset, understand its structure, and create a balanced &amp;ldquo;golden set&amp;rdquo; of data to use for testing later.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Step 1: Environment Setup&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;You will need the &lt;code&gt;datasets&lt;/code&gt; library from Hugging Face.&lt;/li&gt;&#xA;&lt;li&gt;Run: &lt;code&gt;pip install datasets pandas&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;(Optional) If you want to see the data in a table format easily, &lt;code&gt;pip install jupyter&lt;/code&gt; and use a notebook, or just use standard Python scripts.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;&lt;strong&gt;Step 2: Load and Inspect (The &amp;ldquo;Deep Dive&amp;rdquo;)&lt;/strong&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Day 1</title>
      <link>https://ochotnicka.eu/research/day1/</link>
      <pubDate>Mon, 16 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/day1/</guid>
      <description>&lt;h1 id=&#34;updated-day-1-for-your-3-week-plan&#34;&gt;&#xA;  Updated Day 1 for Your 3-Week Plan&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#updated-day-1-for-your-3-week-plan&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;Here&amp;rsquo;s your revised Day 1 that incorporates Codeberg setup and a progress tracking system:&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;day-1-monday-environment-setup-codeberg-configuration--paper-reading&#34;&gt;&#xA;  Day 1 (Monday): Environment Setup, Codeberg Configuration &amp;amp; Paper Reading&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-1-monday-environment-setup-codeberg-configuration--paper-reading&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Time:&lt;/strong&gt; 3-4 hours&lt;/p&gt;&#xA;&lt;h3 id=&#34;tasks&#34;&gt;&#xA;  Tasks:&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#tasks&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;h4 id=&#34;part-a-codeberg-account-setup-45-60-min&#34;&gt;&#xA;  Part A: Codeberg Account Setup (45-60 min)&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#part-a-codeberg-account-setup-45-60-min&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h4&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&#xA;&lt;p&gt;&lt;strong&gt;Create Codeberg Account&lt;/strong&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Plan</title>
      <link>https://ochotnicka.eu/research/plan/</link>
      <pubDate>Sat, 14 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/plan/</guid>
      <description>&lt;h2 id=&#34;3-week-plan-prompt-based-mitigation-for-toxic-content&#34;&gt;&#xA;  3-Week Plan: Prompt-Based Mitigation for Toxic Content&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#3-week-plan-prompt-based-mitigation-for-toxic-content&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Goal&lt;/strong&gt;: Systematically evaluate whether safety-focused system prompts reduce toxic output on ToxiGen benchmark using a local LLM.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Timeline&lt;/strong&gt;: 3 weeks, Monday-Friday only (15 working days)&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Estimated daily time&lt;/strong&gt;: 2-3 hours/day for Weeks 1-2, 3-4 hours/day for Week 3&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Total time investment&lt;/strong&gt;: ~40-45 hours&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;week-1-setup-learning-and-infrastructure-days-1-5&#34;&gt;&#xA;  Week 1: Setup, Learning, and Infrastructure (Days 1-5)&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#week-1-setup-learning-and-infrastructure-days-1-5&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;h3 id=&#34;day-1-monday-environment-setup--paper-reading&#34;&gt;&#xA;  Day 1 (Monday): Environment Setup &amp;amp; Paper Reading&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#day-1-monday-environment-setup--paper-reading&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;h4 id=&#34;updated-day-1&#34;&gt;&#xA;  &lt;a href=&#34;../day1&#34; &gt;UPDATED DAY 1&lt;/a&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#updated-day-1&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h4&gt;&#xA;&lt;p&gt;&lt;strong&gt;Time&lt;/strong&gt;: 2-3 hours&lt;/p&gt;</description>
    </item>
    <item>
      <title>AI Generated Learning Plan</title>
      <link>https://ochotnicka.eu/research/agi_strategy_learning_roadmap/</link>
      <pubDate>Fri, 13 Mar 2026 17:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/agi_strategy_learning_roadmap/</guid>
      <description>&lt;p&gt;We were given the task of using an LLM to generate a learning plan for our&#xA;project.  This is what Claude Sonnet 4.6 created.&lt;/p&gt;&#xA;&lt;h2 id=&#34;learning-roadmap-local-llm-safety-testing&#34;&gt;&#xA;  Learning Roadmap: Local LLM Safety Testing&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#learning-roadmap-local-llm-safety-testing&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Phase&lt;/th&gt;&#xA;          &lt;th&gt;Duration&lt;/th&gt;&#xA;          &lt;th&gt;Key Activities&lt;/th&gt;&#xA;          &lt;th&gt;Success Criteria&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;1. Environment Setup&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;1-2 days&lt;/td&gt;&#xA;          &lt;td&gt;Install Ollama or LM Studio; download a model (Llama 3.1 8B or Mistral 7B); verify GPU acceleration working on M2 Max&lt;/td&gt;&#xA;          &lt;td&gt;Successfully run inference locally with acceptable speed (&amp;gt;20 tokens/sec)&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;2. Baseline Testing&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;2-3 days&lt;/td&gt;&#xA;          &lt;td&gt;Select/create test prompts from ToxiGen or BBQ; run baseline evaluation; document model responses; establish scoring methodology&lt;/td&gt;&#xA;          &lt;td&gt;Complete 50-100 test prompts with documented baseline scores&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;3. Intervention Method&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;3-5 days&lt;/td&gt;&#xA;          &lt;td&gt;Choose mitigation approach (system prompts, fine-tuning, or RAG); implement the intervention; validate it&amp;rsquo;s working correctly&lt;/td&gt;&#xA;          &lt;td&gt;Intervention successfully applied without breaking model functionality&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;4. Post-Intervention Testing&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;1-2 days&lt;/td&gt;&#xA;          &lt;td&gt;Re-run identical test suite; score responses using same methodology; compare results quantitatively&lt;/td&gt;&#xA;          &lt;td&gt;Documented comparison showing measurable change in safety metrics&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;5. Analysis &amp;amp; Documentation&lt;/strong&gt;&lt;/td&gt;&#xA;          &lt;td&gt;1-2 days&lt;/td&gt;&#xA;          &lt;td&gt;Analyze what worked/didn&amp;rsquo;t work; document limitations; identify next steps for deeper investigation&lt;/td&gt;&#xA;          &lt;td&gt;Written report with findings, methodology, and lessons learned&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;&lt;strong&gt;Total estimated time: 8-14 days&lt;/strong&gt; (assuming part-time effort)&lt;/p&gt;</description>
    </item>
    <item>
      <title>AGI Strategy - Personal Action Plan</title>
      <link>https://ochotnicka.eu/research/agi_strategy_plan/</link>
      <pubDate>Fri, 13 Mar 2026 16:37:13 +0200</pubDate>
      <guid>https://ochotnicka.eu/research/agi_strategy_plan/</guid>
      <description>&lt;h1 id=&#34;tldr&#34;&gt;&#xA;  TL;DR&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#tldr&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://ochotnicka.eu/research/plan/&#34;  class=&#34;external-link&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Day-by-day plan&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://codeberg.org/lauren_o/agi_strategy_toxigen_mitigation&#34;  class=&#34;external-link&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Code repo&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;summary&#34;&gt;&#xA;  Summary&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#summary&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;p&gt;I will be researching the current prevelance of harmful content (specifics tbd) and&#xA;reporting on it (phase 1).  Following that, I will be setting up a local LLM so&#xA;that I can test and use the ToxiGen benchmark (phase 2).  After that has been&#xA;setup, I will research known methods to reduce harmful content, implement that in&#xA;the LLM environment and retest.  Following this, I will write reports to&#xA;summarize the results and learnings that happened during the two phases of this&#xA;project.&lt;/p&gt;</description>
    </item>
    <item>
      <title></title>
      <link>https://ochotnicka.eu/research/dir_structure/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://ochotnicka.eu/research/dir_structure/</guid>
      <description>&lt;h1 id=&#34;working-directory-structure&#34;&gt;&#xA;  Working Directory Structure&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#working-directory-structure&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;h2 id=&#34;eu-ai-benchmark&#34;&gt;&#xA;  &lt;code&gt;eu-ai-benchmark/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#eu-ai-benchmark&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;h3 id=&#34;data&#34;&gt;&#xA;  &lt;code&gt;data/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#data&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;raw/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;eu_ai_act.xml&lt;/code&gt; — Source from EUR-Lex&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;gdpr.xml&lt;/code&gt; — Source from EUR-Lex&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;clauses/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;eu_ai_act_clauses.json&lt;/code&gt; — Parsed atomic clauses&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;gdpr_clauses.json&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;combined_clauses.json&lt;/code&gt; — Merged, deduplicated&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;scenarios/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;scenarios_raw.json&lt;/code&gt; — Direct LLM output&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;scenarios_reviewed.json&lt;/code&gt; — After human QA&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;embeddings/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;clauses.index&lt;/code&gt; — Chroma or FAISS vector store&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;src&#34;&gt;&#xA;  &lt;code&gt;src/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#src&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;parse/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;parse_eu_ai_act.py&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;parse_gdpr.py&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;combine_clauses.py&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;scenarios/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;generate_scenarios.py&lt;/code&gt; — Batched API calls to generate scenarios&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;review_scenarios.py&lt;/code&gt; — CLI tool for human QA pass&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;benchmark/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;runner.py&lt;/code&gt; — Sends scenarios to models, logs responses&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;evaluator.py&lt;/code&gt; — LLM-as-judge + RAG evaluation logic&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;retriever.py&lt;/code&gt; — Chroma/FAISS retrieval wrapper&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;aggregate.py&lt;/code&gt; — Rolls up scores to per-model/per-clause stats&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;utils/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;api_client.py&lt;/code&gt; — Unified wrapper: Ollama + Anthropic/OpenAI&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;logger.py&lt;/code&gt; — Structured logging shared across modules&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;models&#34;&gt;&#xA;  &lt;code&gt;models/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#models&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;registry.yaml&lt;/code&gt; — Model inventory: name, type, path/endpoint&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;system_prompts/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;llama3_safety.txt&lt;/code&gt; — System prompt for safety variant&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;mistral_safety.txt&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;adapters/&lt;/code&gt; — LoRA weights if fine-tuning is used&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;llama3_safety_lora/&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;mistral_safety_lora/&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;results&#34;&gt;&#xA;  &lt;code&gt;results/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#results&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;raw/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;run_YYYYMMDD_HHMMSS/&lt;/code&gt; — One folder per benchmark run&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;llama3_8b.jsonl&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;mistral_7b.jsonl&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;gemma2_9b.jsonl&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;qwen2_7b.jsonl&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;llama3_8b_safety.jsonl&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;mistral_7b_safety.jsonl&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;evaluated/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;run_YYYYMMDD_HHMMSS/&lt;/code&gt; — Mirrors raw run folder&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;scores.jsonl&lt;/code&gt; — Per-response scores + reasoning&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;summary.json&lt;/code&gt; — Aggregate stats for this run&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;final/&lt;/code&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;combined_scores.csv&lt;/code&gt; — All runs merged, for write-up&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;charts/&lt;/code&gt; — Generated figures&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;notebooks&#34;&gt;&#xA;  &lt;code&gt;notebooks/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#notebooks&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;explore_clauses.ipynb&lt;/code&gt; — Sanity-check parsed data&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;explore_scenarios.ipynb&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;analyse_results.ipynb&lt;/code&gt; — Produce charts and tables for write-up&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;writeup&#34;&gt;&#xA;  &lt;code&gt;writeup/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#writeup&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;draft.md&lt;/code&gt; — Main document&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;references.bib&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;figures/&lt;/code&gt; — Copies of charts used in document&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;config&#34;&gt;&#xA;  &lt;code&gt;config/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#config&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;settings.yaml&lt;/code&gt; — Paths, model names, API targets, batch sizes&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;rubric.yaml&lt;/code&gt; — Evaluator scoring rubric (shared by all runs)&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;tests&#34;&gt;&#xA;  &lt;code&gt;tests/&lt;/code&gt;&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#tests&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;test_parser.py&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;test_runner.py&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;test_evaluator.py&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;root-files&#34;&gt;&#xA;  Root files&#xA;  &lt;a class=&#34;heading-link&#34; href=&#34;#root-files&#34;&gt;&#xA;    &lt;i class=&#34;fa-solid fa-link&#34; aria-hidden=&#34;true&#34; title=&#34;Link to heading&#34;&gt;&lt;/i&gt;&#xA;    &lt;span class=&#34;sr-only&#34;&gt;Link to heading&lt;/span&gt;&#xA;  &lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;.env&lt;/code&gt; — API keys — never commit&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;.gitignore&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;requirements.txt&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;README.md&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;</description>
    </item>
  </channel>
</rss>
