Working Directory Structure Link to heading

eu-ai-benchmark/ Link to heading

data/ Link to heading

  • raw/
    • eu_ai_act.xml — Source from EUR-Lex
    • gdpr.xml — Source from EUR-Lex
  • clauses/
    • eu_ai_act_clauses.json — Parsed atomic clauses
    • gdpr_clauses.json
    • combined_clauses.json — Merged, deduplicated
  • scenarios/
    • scenarios_raw.json — Direct LLM output
    • scenarios_reviewed.json — After human QA
  • embeddings/
    • clauses.index — Chroma or FAISS vector store

src/ Link to heading

  • parse/
    • parse_eu_ai_act.py
    • parse_gdpr.py
    • combine_clauses.py
  • scenarios/
    • generate_scenarios.py — Batched API calls to generate scenarios
    • review_scenarios.py — CLI tool for human QA pass
  • benchmark/
    • runner.py — Sends scenarios to models, logs responses
    • evaluator.py — LLM-as-judge + RAG evaluation logic
    • retriever.py — Chroma/FAISS retrieval wrapper
    • aggregate.py — Rolls up scores to per-model/per-clause stats
  • utils/
    • api_client.py — Unified wrapper: Ollama + Anthropic/OpenAI
    • logger.py — Structured logging shared across modules

models/ Link to heading

  • registry.yaml — Model inventory: name, type, path/endpoint
  • system_prompts/
    • llama3_safety.txt — System prompt for safety variant
    • mistral_safety.txt
  • adapters/ — LoRA weights if fine-tuning is used
    • llama3_safety_lora/
    • mistral_safety_lora/

results/ Link to heading

  • raw/
    • run_YYYYMMDD_HHMMSS/ — One folder per benchmark run
      • llama3_8b.jsonl
      • mistral_7b.jsonl
      • gemma2_9b.jsonl
      • qwen2_7b.jsonl
      • llama3_8b_safety.jsonl
      • mistral_7b_safety.jsonl
  • evaluated/
    • run_YYYYMMDD_HHMMSS/ — Mirrors raw run folder
      • scores.jsonl — Per-response scores + reasoning
      • summary.json — Aggregate stats for this run
  • final/
    • combined_scores.csv — All runs merged, for write-up
    • charts/ — Generated figures

notebooks/ Link to heading

  • explore_clauses.ipynb — Sanity-check parsed data
  • explore_scenarios.ipynb
  • analyse_results.ipynb — Produce charts and tables for write-up

writeup/ Link to heading

  • draft.md — Main document
  • references.bib
  • figures/ — Copies of charts used in document

config/ Link to heading

  • settings.yaml — Paths, model names, API targets, batch sizes
  • rubric.yaml — Evaluator scoring rubric (shared by all runs)

tests/ Link to heading

  • test_parser.py
  • test_runner.py
  • test_evaluator.py

Root files Link to heading

  • .env — API keys — never commit
  • .gitignore
  • requirements.txt
  • README.md