This test concept defines binding guidelines for implementation, structure, naming, tools, and quality assurance of all automated tests in the project.

1. Goals and Principles

Target Coverage: At least 80% branch coverage (measured via SonarQube).

Follow the Test Pyramid: Focus on unit tests, followed by component/integration tests. End-to-end tests are outside the scope of this document.

Readability above all: Meaningful test names and descriptions, clear Given/When/Then structure.

Fast feedback: Unit tests must not load a Spring Context and must execute very quickly.

2. Scope

Technical scope: Java 21+, Spring Boot, JUnit 5 (Jupiter), AssertJ, Mockito, ArchUnit, Testcontainers (for integration/component tests if external systems are needed).

3. Test Types & Boundaries

3.1. Unit Tests

Definition: Testing individual classes/functions in isolation (pure Java tests, no Spring Context).

No Spring annotations such as @SpringBootTest, @MockBean, @Autowired, @ExtendWith(SpringExtension.class).

Isolation via mocks/stubs (e.g. Mockito) or test doubles.

Goal: Logic, edge cases, error handling, contracts.

3.2. Component / Integration Tests

Definition: Testing multiple collaborating components including Spring Context (e.g. @SpringBootTest).

Goal: Interaction, configuration, persistence, (de-)serialization, web layer, transactions.

External dependencies: Prefer Testcontainers (e.g. PostgreSQL, Kafka). Alternatively, embedded variants.

Unit tests address pure logic. Component/integration tests address Spring wiring, data, and interface behavior.

3.3. Prompt Evaluation

Definition: Human-assisted test in which developers run various prompts against an AI chatbot and evaluate the results.

The results, the prompts and parameters used and some meta information are collected, stored and evaluated in order to achieve the best results.

Goal: Improve response quality, standardize prompt design, measure performance, measure costs (tokens), standardize configuration (e.g.: selected model, temperature…​)

More information can be found in the prompt evaluation concept section.

4. Coverage Goals and Metrics

Branch coverage ≥ 80% total per module (SonarQube quality gate).

Coverage includes unit + component-integration tests (tests run by Maven Failsafe are excluded).

5. Naming & Structure

Package/directory structure mirrors production code.

Unit tests, Component-integration test: Class suffix *Test (e.g. PriceCalculatorTest).

Component-integration test: Class suffix *CT (e.g. PriceCalculatorServiceCT).

System-integration tests: Class suffix *IT (e.g. OrderCreationIT).

Alternatively, @Tag can be used for differentiation.

Methods: Use @DisplayName and follow the pattern: “When … and/or … then …” (Examples: “When discount is valid and cart is not empty, then final price is reduced”).

6. Conventions for Test Cases

AssertJ as the only assertion framework. JUnit-Jupiter assertions (org.junit.jupiter.api.Assertions) are forbidden (enforced by ArchUnit rule, see below).

Clearly structure Given/When/Then.

Use @ParameterizedTest where appropriate (data variants, boundary values).

No Spring contexts in unit and component-itegration tests (for performance & isolation).

7. Examples

7.1. Example @DisplayName & AssertJ

@DisplayName("When discount is valid and cart is not empty, then final price is reduced")
@Test
void rabattWirdAngewendet() {
    // Given
    var calc = new PriceCalculator();
    var cart = new Cart(List.of(new Item("A", 100)));
    var rabatt = new Rabatt(0.2);

    // When
    var result = calc.berechneEndpreis(cart, rabatt);

    // Then
    assertThat(result).isEqualTo(80.0);
}

7.2. Example ParameterizedTest

@DisplayName("When input value is invalid, then IllegalArgumentException is thrown")
@ParameterizedTest(name = "Case {index}: input={0}")
@ValueSource(ints = { -1, -10, Integer.MIN_VALUE })
void invalidInputsThrowException(int input) {
    assertThatThrownBy(() -> Validator.checkPositive(input))
        .isInstanceOf(IllegalArgumentException.class)
        .hasMessageContaining("positiv");
}

8. Architecture Protection with ArchUnit

8.1. Forbid JUnit-Jupiter Assertions (allow AssertJ only)

@AnalyzeClasses(packages = "com.example")
public class AssertionFrameworkRulesTest {

  @ArchTest
  static final ArchRule noJunitAssertions =
      noClasses().should().dependOnClassesThat().haveNameMatching(
              "org\\.junit\\.jupiter\\.api\\.Assertions(\\$.*)?"
              );
}

8.2. Enforce Naming Conventions

@AnalyzeClasses(packages = "com.example")
public class NamingRulesTest {

  @ArchTest
  static final ArchRule unitTestsHaveTestSuffix =
      classes().that().resideInAPackage("..")
          .and().areAnnotatedWith(org.junit.jupiter.api.Test.class)
          .should().haveSimpleNameEndingWith("Test");

  @ArchTest
  static final ArchRule integrationTestsHaveITSuffix =
      classes().that().haveSimpleNameEndingWith("IT")
          .should().resideInAnyPackage(".."); // Placeholder – can be tightened
}
The above rules are examples. They should be placed in a separate module/package (e.g. architecture) within the project and be executed automatically during the build.

9. Spring-Specific Guidelines

9.1. Unit Tests

Not allowed: @SpringBootTest, @ExtendWith(SpringExtension.class), @MockBean, @Autowired.

Allowed: Pure JUnit/Mockito tests. Constructor injection in production code improves testability.

Time budget: Aim for < 100 ms per test method.

9.2. Component Tests (no Spring Context)

Definition: Test a small cluster of collaborating classes as a unit (e.g., service + domain + mappers), without Spring. Wire collaborators manually; mock only true external collaborators (e.g., repositories, HTTP clients, messaging).

Goals: Validate cross-class behavior, orchestration, contracts, edge cases, and error handling across the component.

Allowed:

  • JUnit 5 + Mockito (mocks/stubs/spies).

  • Real implementations for in-process collaborators where feasible (keep the collaboration meaningful).

  • Test doubles for out-of-process boundaries (DB, HTTP, messaging, filesystem).

Not allowed:

  • Any Spring context or Spring test annotations.

  • Network, filesystem, or database access.

Time budget: Aim for < 150–200 ms per test method.

Naming/Tagging: Suffix *CT (e.g., OrderCreationCT) and @Tag("component").

Structure tips:

  • Prefer builders/factories for test data.

  • Keep Given/When/Then explicit.

  • Avoid over-mocking: don’t mock value objects; only mock external collaborators.

9.3. Integration Tests

Definition: Test the integration of multiple components with Spring (wiring, configuration, persistence, serialization, web, transactions).

Allowed:

  • @SpringBootTest (optionally webEnvironment = RANDOM_PORT)

  • Real persistence via Testcontainers (e.g., PostgreSQL) preferred; H2 optional where compatible.

  • HTTP stubbing (e.g., WireMock) for external services.

  • Test data via Liquibase/Flyway or builder/factory; rollback transactions; tests must be independent.

Notable focuses: Spring wiring, configuration, DB schema, repositories, controllers, serialization, transactionality.

Naming/Tagging: Suffix *IT (e.g., OrderCreationIT) and @Tag("integration").

10. Build & Execution

10.1. Maven

Surefire runs unit tests and component tests(*Test and *CT).

Failsafe runs integration tests (*IT) during integration-test/verify phase.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>3.2.5</version>
            <configuration>
                <includes>
                    <include>**/*Test.java</include>
                    <include>**/*CT.java</include>
                </includes>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-failsafe-plugin</artifactId>
            <version>3.2.5</version>
                <configuration>
                    <includes>
                        <include>**/*IT.java</include>
                    </includes>
                </configuration>
            <executions>
                <execution>
                    <goals>
                        <goal>integration-test</goal>
                        <goal>verify</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

11. SonarQube & Quality Gate

CI runs tests + coverage report (JaCoCo) and publishes results to SonarQube.

Quality gate fails if: Branch coverage < 80%, new critical smells/bugs/vulnerabilities.

New/modified files must not decrease coverage (“Clean as You Code”).

12. Style & Structure in Tests

Consistently apply Arrange/Act/Assert (or Given/When/Then).

One assertion framework: AssertJ (no mixed usage).

Each test case should test exactly one aspect. Multiple aspects → split into separate tests.

Keep test data clean: Builder/factory methods, ObjectMother pattern.

Use parameterization for variants and boundary values.

Time & randomness: make deterministic (Clocks, Seeds).

13. Negative & Error Cases

For every public method: at least one error/boundary case (e.g. null, empty, invalid, overflow).

Check exceptions using assertThatThrownBy/assertThatExceptionOfType.

14. Mocking & Stubbing

Mockito for mocks/stubs/spies.

No over-mocking: Only mock external collaborators, not value objects.

Use verification sparingly: Verify behavior only when it ensures observable effects (e.g. “Email was sent”).

15. Test Data & Persistence

Unit: no DB/filesystem/network access.

Integration: Use Testcontainers; initialize data via migration scripts or factories; each method isolated (rollback/rebuild).

16. PR Checklist (Mandatory)

Tests exist and pass (locally & in CI).

@DisplayName in all test methods following the pattern “When … and/or … then …”.

No usage of org.junit.jupiter.api.Assertions.

No Spring contexts in unit tests.

Correct suffixes: *Test (unit), *IT (integration). Alternatively, @Tag may be used for differentiation.

Parameterized tests where appropriate.

SonarQube quality gate met (≥ 80% branch coverage).

17. Prompt Evaluation

17.1. Introduction

Prompt evaluation is a human-assigned testing approach in which developers run various prompts against an AI chatbot to improve results. All results, including prompts, model parameters, created entities and relevant metadata are collected, stored and analyzed to continuously refine prompt design and model configuration.

This process ensures that the prompt interactions become more predictable, cost-efficient (tokens) and aligned with the project goals.

17.2. Goals

  • Improve response quality

  • Standardize prompt design

  • Measure model performance

  • Measure costs (token usage)

  • Standardize configuration (e.g., model selection, temperature)

  • Enable reproducible results from prompt experiments

  • provide a structured dataset for analysis and refinement (results, prompts, configuration, evaluation)

  • optimization of the existing personas

17.3. Test protocol

Each tester uses their own setup of

  • Tester Name

  • Date

  • Model Version: selected LLM (default: GPT 5 Standard)

  • System role: Provides overall instruction context

  • Pre-prompt: Provides the input data (persona) and the user instruction

  • Temperature: Indicates creativity or determination

    • 0.2 means low creativity and high determination

    • 0.8 means high creativity and low determination

  • Notes on Setup: Additional user notes for the test setup

The tester uses a persona evaluation template to categorise and evaluate the resulting optimised persona in the context of the original input persona.

17.4. Result Storage Concepts

This part defines how all inputs, outputs and evaluations for the prompt evaluation tests are stored in a structured way.

Where we will store the results has not yet been decided, but one possibility would be in the test directory in the repository or in the project directory in SharePoint. Initially, the testers save their evaluations locally until it has been clarified where the data will be stored.

The results could be stored in the following structure:

  • <persona_surname_name> (replaced with persona name, e.g.: 'anna_mueller')

    • <prompt_evaluation_date_time_and_tester_abbreviation> (replaced with e.g.: '2025-11-18-14-57-06_FSe' (YYYY-MM-DD-HH-mm-SS_XXx)

      • input

        • persona.json (optional, because it already exists under src/test/resources/personas/<persona_surname_name> in the repository)

        • information.md (optional, because it already exists under src/test/resources/personas/<persona_surname_name> in the repository)

        • system-prompt.txt

        • user-prompt.txt

        • configuration.json (model, temperature, etc.)

      • output

        • persona.json (optimized persona)

        • metadata.json (optional: some metadata like used tokens, response time, etc.)

      • evaluation

        • persona-evaluation-template.md (contains the evaluation and categorization of the tester, copied from 'test/resources/persona-evaluation-template.md')

In addition, a concept must be developed to record and analyse all results in a structured form.

18. Appendix: Abbreviations & References

AAA: Arrange–Act–Assert (Given–When–Then).

IT: Integration Test.

PIT: Pitest Mutation Testing.

SUT: System Under Test.

This document is binding. Deviations must be justified and explicitly approved during code review.