This test concept defines binding guidelines for implementation, structure, naming, tools, and quality assurance of all automated tests in the project.
1. Goals and Principles
Target Coverage: At least 80% branch coverage (measured via SonarQube).
Follow the Test Pyramid: Focus on unit tests, followed by component/integration tests. End-to-end tests are outside the scope of this document.
Readability above all: Meaningful test names and descriptions, clear Given/When/Then structure.
Fast feedback: Unit tests must not load a Spring Context and must execute very quickly.
2. Scope
Technical scope: Java 21+, Spring Boot, JUnit 5 (Jupiter), AssertJ, Mockito, ArchUnit, Testcontainers (for integration/component tests if external systems are needed).
3. Test Types & Boundaries
3.1. Unit Tests
Definition: Testing individual classes/functions in isolation (pure Java tests, no Spring Context).
No Spring annotations such as @SpringBootTest, @MockBean, @Autowired, @ExtendWith(SpringExtension.class).
Isolation via mocks/stubs (e.g. Mockito) or test doubles.
Goal: Logic, edge cases, error handling, contracts.
3.2. Component / Integration Tests
Definition: Testing multiple collaborating components including Spring Context (e.g. @SpringBootTest).
Goal: Interaction, configuration, persistence, (de-)serialization, web layer, transactions.
External dependencies: Prefer Testcontainers (e.g. PostgreSQL, Kafka). Alternatively, embedded variants.
| Unit tests address pure logic. Component/integration tests address Spring wiring, data, and interface behavior. |
3.3. Prompt Evaluation
Definition: Human-assisted test in which developers run various prompts against an AI chatbot and evaluate the results.
The results, the prompts and parameters used and some meta information are collected, stored and evaluated in order to achieve the best results.
Goal: Improve response quality, standardize prompt design, measure performance, measure costs (tokens), standardize configuration (e.g.: selected model, temperature…)
More information can be found in the prompt evaluation concept section.
4. Coverage Goals and Metrics
Branch coverage ≥ 80% total per module (SonarQube quality gate).
Coverage includes unit + component-integration tests (tests run by Maven Failsafe are excluded).
5. Naming & Structure
Package/directory structure mirrors production code.
Unit tests, Component-integration test: Class suffix *Test (e.g. PriceCalculatorTest).
Component-integration test: Class suffix *CT (e.g. PriceCalculatorServiceCT).
System-integration tests: Class suffix *IT (e.g. OrderCreationIT).
Alternatively, @Tag can be used for differentiation.
Methods: Use @DisplayName and follow the pattern:
“When … and/or … then …” (Examples: “When discount is valid and cart is not empty, then final price is reduced”).
6. Conventions for Test Cases
AssertJ as the only assertion framework. JUnit-Jupiter assertions (org.junit.jupiter.api.Assertions) are forbidden (enforced by ArchUnit rule, see below).
Clearly structure Given/When/Then.
Use @ParameterizedTest where appropriate (data variants, boundary values).
No Spring contexts in unit and component-itegration tests (for performance & isolation).
7. Examples
7.1. Example @DisplayName & AssertJ
@DisplayName("When discount is valid and cart is not empty, then final price is reduced")
@Test
void rabattWirdAngewendet() {
// Given
var calc = new PriceCalculator();
var cart = new Cart(List.of(new Item("A", 100)));
var rabatt = new Rabatt(0.2);
// When
var result = calc.berechneEndpreis(cart, rabatt);
// Then
assertThat(result).isEqualTo(80.0);
}
7.2. Example ParameterizedTest
@DisplayName("When input value is invalid, then IllegalArgumentException is thrown")
@ParameterizedTest(name = "Case {index}: input={0}")
@ValueSource(ints = { -1, -10, Integer.MIN_VALUE })
void invalidInputsThrowException(int input) {
assertThatThrownBy(() -> Validator.checkPositive(input))
.isInstanceOf(IllegalArgumentException.class)
.hasMessageContaining("positiv");
}
8. Architecture Protection with ArchUnit
8.1. Forbid JUnit-Jupiter Assertions (allow AssertJ only)
@AnalyzeClasses(packages = "com.example")
public class AssertionFrameworkRulesTest {
@ArchTest
static final ArchRule noJunitAssertions =
noClasses().should().dependOnClassesThat().haveNameMatching(
"org\\.junit\\.jupiter\\.api\\.Assertions(\\$.*)?"
);
}
8.2. Enforce Naming Conventions
@AnalyzeClasses(packages = "com.example")
public class NamingRulesTest {
@ArchTest
static final ArchRule unitTestsHaveTestSuffix =
classes().that().resideInAPackage("..")
.and().areAnnotatedWith(org.junit.jupiter.api.Test.class)
.should().haveSimpleNameEndingWith("Test");
@ArchTest
static final ArchRule integrationTestsHaveITSuffix =
classes().that().haveSimpleNameEndingWith("IT")
.should().resideInAnyPackage(".."); // Placeholder – can be tightened
}
| The above rules are examples. They should be placed in a separate module/package (e.g. architecture) within the project and be executed automatically during the build. |
9. Spring-Specific Guidelines
9.1. Unit Tests
Not allowed: @SpringBootTest, @ExtendWith(SpringExtension.class), @MockBean, @Autowired.
Allowed: Pure JUnit/Mockito tests. Constructor injection in production code improves testability.
Time budget: Aim for < 100 ms per test method.
9.2. Component Tests (no Spring Context)
Definition: Test a small cluster of collaborating classes as a unit (e.g., service + domain + mappers), without Spring. Wire collaborators manually; mock only true external collaborators (e.g., repositories, HTTP clients, messaging).
Goals: Validate cross-class behavior, orchestration, contracts, edge cases, and error handling across the component.
Allowed:
-
JUnit 5 + Mockito (mocks/stubs/spies).
-
Real implementations for in-process collaborators where feasible (keep the collaboration meaningful).
-
Test doubles for out-of-process boundaries (DB, HTTP, messaging, filesystem).
Not allowed:
-
Any Spring context or Spring test annotations.
-
Network, filesystem, or database access.
Time budget: Aim for < 150–200 ms per test method.
Naming/Tagging: Suffix *CT (e.g., OrderCreationCT) and @Tag("component").
Structure tips:
-
Prefer builders/factories for test data.
-
Keep Given/When/Then explicit.
-
Avoid over-mocking: don’t mock value objects; only mock external collaborators.
9.3. Integration Tests
Definition: Test the integration of multiple components with Spring (wiring, configuration, persistence, serialization, web, transactions).
Allowed:
-
@SpringBootTest(optionally webEnvironment = RANDOM_PORT) -
Real persistence via Testcontainers (e.g., PostgreSQL) preferred; H2 optional where compatible.
-
HTTP stubbing (e.g., WireMock) for external services.
-
Test data via Liquibase/Flyway or builder/factory; rollback transactions; tests must be independent.
Notable focuses: Spring wiring, configuration, DB schema, repositories, controllers, serialization, transactionality.
Naming/Tagging: Suffix *IT (e.g., OrderCreationIT) and @Tag("integration").
10. Build & Execution
10.1. Maven
Surefire runs unit tests and component tests(*Test and *CT).
Failsafe runs integration tests (*IT) during integration-test/verify phase.
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<includes>
<include>**/*Test.java</include>
<include>**/*CT.java</include>
</includes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<includes>
<include>**/*IT.java</include>
</includes>
</configuration>
<executions>
<execution>
<goals>
<goal>integration-test</goal>
<goal>verify</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
11. SonarQube & Quality Gate
CI runs tests + coverage report (JaCoCo) and publishes results to SonarQube.
Quality gate fails if: Branch coverage < 80%, new critical smells/bugs/vulnerabilities.
New/modified files must not decrease coverage (“Clean as You Code”).
12. Style & Structure in Tests
Consistently apply Arrange/Act/Assert (or Given/When/Then).
One assertion framework: AssertJ (no mixed usage).
Each test case should test exactly one aspect. Multiple aspects → split into separate tests.
Keep test data clean: Builder/factory methods, ObjectMother pattern.
Use parameterization for variants and boundary values.
Time & randomness: make deterministic (Clocks, Seeds).
13. Negative & Error Cases
For every public method: at least one error/boundary case (e.g. null, empty, invalid, overflow).
Check exceptions using assertThatThrownBy/assertThatExceptionOfType.
14. Mocking & Stubbing
Mockito for mocks/stubs/spies.
No over-mocking: Only mock external collaborators, not value objects.
Use verification sparingly: Verify behavior only when it ensures observable effects (e.g. “Email was sent”).
15. Test Data & Persistence
Unit: no DB/filesystem/network access.
Integration: Use Testcontainers; initialize data via migration scripts or factories; each method isolated (rollback/rebuild).
16. PR Checklist (Mandatory)
Tests exist and pass (locally & in CI).
@DisplayName in all test methods following the pattern “When … and/or … then …”.
No usage of org.junit.jupiter.api.Assertions.
No Spring contexts in unit tests.
Correct suffixes: *Test (unit), *IT (integration). Alternatively, @Tag may be used for differentiation.
Parameterized tests where appropriate.
SonarQube quality gate met (≥ 80% branch coverage).
17. Prompt Evaluation
17.1. Introduction
Prompt evaluation is a human-assigned testing approach in which developers run various prompts against an AI chatbot to improve results. All results, including prompts, model parameters, created entities and relevant metadata are collected, stored and analyzed to continuously refine prompt design and model configuration.
This process ensures that the prompt interactions become more predictable, cost-efficient (tokens) and aligned with the project goals.
17.2. Goals
-
Improve response quality
-
Standardize prompt design
-
Measure model performance
-
Measure costs (token usage)
-
Standardize configuration (e.g., model selection, temperature)
-
Enable reproducible results from prompt experiments
-
provide a structured dataset for analysis and refinement (results, prompts, configuration, evaluation)
-
optimization of the existing personas
17.3. Test protocol
Each tester uses their own setup of
-
Tester Name
-
Date
-
Model Version: selected LLM (default: GPT 5 Standard)
-
System role: Provides overall instruction context
-
Pre-prompt: Provides the input data (persona) and the user instruction
-
Temperature: Indicates creativity or determination
-
0.2 means low creativity and high determination
-
0.8 means high creativity and low determination
-
-
Notes on Setup: Additional user notes for the test setup
The tester uses a persona evaluation template to categorise and evaluate the resulting optimised persona in the context of the original input persona.
17.3.1. Persona Evaluation Template
17.4. Result Storage Concepts
This part defines how all inputs, outputs and evaluations for the prompt evaluation tests are stored in a structured way.
Where we will store the results has not yet been decided, but one possibility would be in the test directory in the repository or in the project directory in SharePoint. Initially, the testers save their evaluations locally until it has been clarified where the data will be stored.
The results could be stored in the following structure:
-
<persona_surname_name> (replaced with persona name, e.g.: 'anna_mueller')
-
<prompt_evaluation_date_time_and_tester_abbreviation> (replaced with e.g.: '2025-11-18-14-57-06_FSe' (YYYY-MM-DD-HH-mm-SS_XXx)
-
input
-
persona.json (optional, because it already exists under src/test/resources/personas/<persona_surname_name> in the repository)
-
information.md (optional, because it already exists under src/test/resources/personas/<persona_surname_name> in the repository)
-
system-prompt.txt
-
user-prompt.txt
-
configuration.json (model, temperature, etc.)
-
-
output
-
persona.json (optimized persona)
-
metadata.json (optional: some metadata like used tokens, response time, etc.)
-
-
evaluation
-
persona-evaluation-template.md (contains the evaluation and categorization of the tester, copied from 'test/resources/persona-evaluation-template.md')
-
-
-
In addition, a concept must be developed to record and analyse all results in a structured form.
18. Appendix: Abbreviations & References
AAA: Arrange–Act–Assert (Given–When–Then).
IT: Integration Test.
PIT: Pitest Mutation Testing.
SUT: System Under Test.
This document is binding. Deviations must be justified and explicitly approved during code review.