In our lab, the most common question we get from cosmetic brands is: How long will my product last, and how do I prove it? Stability testing is the answer—but it is often misunderstood, under-specified, or rushed. We run stability studies and preservative efficacy testing (PET) every week for skincare, haircare, and personal care brands. This guide explains what we actually do, what we see when things go wrong, and how brands can use the data to make confident shelf-life claims.

Everything here is written from the perspective of a lab that performs these tests routinely. We are not summarizing a textbook; we are describing our workflows, our equipment, and the failure modes we encounter most often. If you are a cosmetic brand preparing for stability testing, this should give you a practical framework for planning, interpreting results, and avoiding common mistakes.

Types of Stability Studies Explained

Stability testing for cosmetics determines how a product holds up over time under defined conditions. The goal is to establish a scientifically defensible shelf life—the period during which the product remains safe, effective, and compliant with its label. There are three main approaches we use:

For raw material and ingredient-level verification, Ayah Labs specializes in contract testing and supplier qualification.

Accelerated Stability Testing

Accelerated stability exposes products to elevated temperature (and sometimes humidity) to simulate aging in a shorter time. A typical protocol might store a product at 40°C ± 2°C and 75% RH for 3 to 6 months. The idea is that chemical reactions and physical changes speed up at higher temperatures, so we can estimate long-term behavior from short-term data.

In our lab, we use ICH-style conditions (40°C/75% RH) for accelerated studies, with pull points at 1, 2, and 3 months—and sometimes 6 months for critical products. We measure appearance, odor, pH, viscosity, and any active ingredients or preservatives that might degrade. The results give brands an early signal of formulation issues. If a product fails accelerated conditions, it will almost certainly fail real-time conditions over a longer period.

Accelerated stability does not replace real-time testing for final shelf-life justification, but it is invaluable for formulation screening and for getting to market faster. Most brands run accelerated studies first to catch problems early, then run real-time studies in parallel to support the final claim.

Intermediate Stability Testing

Intermediate stability uses conditions between accelerated and real-time—typically 30°C ± 2°C and 65% RH. It is less aggressive than 40°C but still speeds up degradation compared to room temperature. Intermediate studies are sometimes used when a product is sensitive to high heat (e.g., some actives or emulsions) or when regulatory frameworks require multiple conditions.

We run intermediate studies when clients need data at conditions closer to real-world storage (e.g., bathroom cabinets in warm climates) or when they want to bridge accelerated and real-time data. The timelines are longer than accelerated but shorter than real-time.

Real-Time Stability Testing

Real-time stability stores products at recommended storage conditions—typically 25°C ± 2°C and 60% RH—and evaluates them over the claimed shelf life. For a 24-month claim, we pull samples at 0, 3, 6, 9, 12, 18, and 24 months (or similar intervals) and test the same parameters as in accelerated studies.

Real-time data is the gold standard for shelf-life justification. Regulators and retailers expect to see real-time data to support long shelf-life claims. The drawback is time: you cannot shortcut a 24-month study. Many brands therefore start real-time studies early, run accelerated studies for quick feedback, and use both datasets to support their label.

Preservative Efficacy Testing (PET) Explained

Preservative efficacy testing (also called antimicrobial effectiveness testing or challenge testing) evaluates whether a product can withstand microbial contamination. We inoculate the product with known concentrations of bacteria, yeast, and mold, then measure how well the preservative system reduces or holds microbial levels over time.

The standard frameworks are USP <51> (United States) and ISO 11930 (international). Both define pass/fail criteria based on log reduction (how much the microbial count drops) at specific timepoints (e.g., 7, 14, 28 days). A product passes if it achieves the required reduction for each challenge organism.

In our lab, we run PET on every water-based cosmetic—creams, lotions, serums, toners, etc. Oil-based or anhydrous products may not need a preservative, but many still undergo challenge testing to demonstrate robustness. The most common failure we see is insufficient log reduction for yeast and mold. Bacteria are often controlled well; fungi are trickier. Formulations that are borderline on PET can fail when exposed to real-world contamination (e.g., consumer fingers, bathroom humidity).

Pass/fail criteria under USP <51>:

Bacteria: ≥3 log reduction at 14 days, no increase by 28 days
Yeast/mold: ≥2 log reduction at 14 days, no increase by 28 days

We advise brands to design their preservative system with a margin of safety. Passing exactly at the limit leaves no room for formulation drift, water activity changes, or packaging variability. When we see repeated PET failures, the fix is usually a preservative adjustment, pH optimization, or reformulation to reduce water activity in the phase where microbes grow.

Common Stability Failures We See in Our Lab

Over the years we have observed recurring failure modes. Here are the most frequent:

Color change: Pigments, botanical extracts, and certain actives can oxidize or react with other ingredients. We see yellowing, browning, or fading. Sometimes the change is purely cosmetic; sometimes it indicates degradation of actives. We document color at each pull point using visual comparison or instrumental colorimetry.

pH drift: pH can shift as buffers degrade or as reactive species form. A cream that starts at pH 5.5 might drift to 6.2 after 6 months at 40°C. That can affect preservative efficacy, irritation potential, and active stability. We always track pH.

Phase separation: Emulsions can crack, creams can oil-out, and suspensions can settle irreversibly. High-temperature accelerated studies often expose weak emulsifier systems. When we see separation, we note whether it is reversible (e.g., by shaking) or irreversible.

Viscosity change: Thickening agents can break down or cross-link over time. We measure viscosity at each timepoint. Significant thinning or thickening can affect consumer experience and sometimes indicates chemical degradation.

Active ingredient degradation: Retinol, vitamin C, peptides, and other actives can degrade. We run HPLC or other assays to quantify actives at each pull. If the active drops below a specified level (often 90% of initial), the product fails.

Odor development: Rancidity, microbial growth, or oxidation can produce off-odors. We perform organoleptic evaluation at each timepoint. A noticeable off-odor often correlates with chemical or microbial issues.

Preservative failure: If preservative levels drop below efficacy (due to degradation or binding), the product becomes vulnerable to contamination. We monitor preservative concentration when possible and correlate with PET results.

How to Interpret Stability Results

When we complete a stability study, we provide a report with all measured parameters at each timepoint. Brands need to know how to read it.

Establish acceptance criteria upfront. Before the study starts, define what “pass” means for each parameter. For example: pH within ±0.5 of initial; viscosity within ±15%; active within 90–110% of initial. Without criteria, interpretation is subjective.

Look for trends, not just point-in-time values. A single anomalous data point might be a lab error or sample handling issue. A clear trend (e.g., pH drifting steadily upward) is more concerning.

Correlate accelerated and real-time data. If you have both, compare them. Does the accelerated study predict the real-time trend? Often there is a rough correlation (e.g., 3 months at 40°C ≈ 12 months at 25°C), though the exact relationship depends on the product and the degradation mechanism.

Consider packaging. Stability is product-in-package. A jar exposes the product to oxygen and finger contamination; a pump or airless packaging reduces both. We test in the final primary packaging whenever possible.

Use the data for label claims. If your real-time study shows the product is stable for 18 months, you can support an 18-month “use by” or “best by” claim. Do not overclaim. We have seen brands assume 24 months when their data only supports 12.

Cost and Timeline Overview

Stability testing is an investment. A typical accelerated study (3 months, multiple pull points, full analytical panel) might run in the range of a few thousand dollars, depending on the number of products, timepoints, and tests. Real-time studies cost more because they extend over 12–24 months and require repeated analytical work.

Timelines:

Accelerated (40°C): 3–6 months from start to final report
Intermediate (30°C): 6–12 months
Real-time (25°C): 12–24 months or longer

We recommend requesting a custom quote based on your specific products and claims. Contact us with your product type and desired shelf-life claim and we will send you a detailed quote by email.

When to Test and When to Retest

When to start stability testing:

Before product launch (mandatory for defensible shelf-life claims)
After any formulation change (new preservative, new active, new supplier)
When changing packaging (especially primary packaging)
When extending shelf-life claim beyond what was previously supported

When to retest:

After a significant formulation change
If you receive consumer complaints about quality or performance
If you change suppliers for critical raw materials
Periodically (e.g., every 2–3 years) to confirm ongoing compliance, especially for long shelf-life products

We have worked with brands that skipped stability on a “minor” change and later discovered that the new preservative or excipient caused unexpected degradation. When in doubt, retest.

Frequently Asked Questions

What is accelerated stability testing?

Accelerated stability testing exposes products to elevated temperature (e.g., 40°C) and humidity to simulate long-term aging in a shorter period. It helps identify formulation issues quickly and supports early shelf-life estimates. It does not replace real-time testing for final claims.

What is preservative efficacy testing (PET)?

Preservative efficacy testing (challenge testing) inoculates a product with bacteria, yeast, and mold to determine whether the preservative system can control microbial growth. Pass/fail criteria are defined by USP <51> or ISO 11930. PET is required for water-based cosmetics to ensure consumer safety.

How long does stability testing take?

Accelerated studies typically run 3–6 months. Real-time studies run 12–24 months or longer, depending on the claimed shelf life. There is no shortcut for real-time data when supporting long shelf-life claims.

What happens if my product fails stability?

Failure means the product does not meet the acceptance criteria at one or more timepoints. Common fixes include reformulation (preservative, pH, active packaging), changing storage conditions on the label, or shortening the shelf-life claim. We work with clients to troubleshoot and recommend next steps.

Do I need stability testing for every product?

For cosmetics sold in the US, stability testing is not legally required by the FDA for most products, but it is a best practice and often required by retailers or distributors. For products sold in the EU under the Cosmetic Regulation, stability data supports the product information file (PIF) and safety assessment. In our experience, responsible brands test all products before market.

Batch size and sampling: We recommend testing from at least three batches when possible—ideally from different production runs. Single-batch studies can miss batch-to-batch variation. For critical products, we advise clients to include multiple batches in their stability protocol and to pull samples from the middle of the batch (not just the beginning or end) to ensure representativeness.

Light stability: Some ingredients—retinol, vitamin C, certain botanical extracts—are photolabile. If your product is packaged in clear or translucent containers, or if it will be displayed in bright retail environments, light stability testing may be warranted. We can run studies under controlled light exposure (e.g., ICH Q1B) to assess photo-degradation. Many brands overlook this and later see color change or active loss when products sit on shelves.

If you are a cosmetic brand preparing for stability testing or preservative efficacy testing, get in touch to discuss your products and timeline. We provide customized study designs, transparent pricing, and reports you can use for regulatory submissions and retail compliance.

Cosmetic Stability Testing: A Lab Scientist's Complete Guide for Brands