ControlNet for Product Photography in 2026: Which Model to Use and When


gdefoto article

ControlNet for Product Photography in 2026: Which Model to Use and When

If you have ever tried to regenerate a background or relight a product shot through pure Stable Diffusion, you know the pain. You take a perfume bottle with a signature edge, push it through img2img,

Intro

If you have ever tried to regenerate a background or relight a product shot through pure Stable Diffusion, you know the pain. You take a perfume bottle with a signature edge, push it through img2img, and the model rebuilds the geometry in its own way. The edge has drifted, the logo turned into a hieroglyph, the cap sits a little higher. For a portrait such liberties can be forgiven. For a marketplace catalog, where the buyer compares the product to a real manufacturer photo, that is a defect.

ControlNet solves exactly this problem. It tells the diffusion which structure to keep frozen and where it is allowed to improvise. In product photography this is not a single universal preprocessor but a zoo of about a dozen types, each with its own strength. Canny is ideal for jewelry with sharp facets, Depth saves the day on sculptural forms, Tile pulls textures during upscale, IPAdapter keeps a unified style across the entire catalog. It is easy to get lost at the start, and most retouchers either get stuck on a single Canny with default weight, or fire up four blocks at once and get mush as a result.

In this article we will lay out which ControlNet to take for which task in 2026, which parameters to tweak, how to combine two preprocessors at once, and how much VRAM it all will consume. No filler, with concrete numbers for jewelry, watches, packaging, cosmetics, and electronics.

What ControlNet Is and Why Product Photography Needs It

ControlNet is a neural network add-on that runs in parallel with the main diffusion model and holds specific features of the source image. Technically it receives a condition map (contour, depth, normal, color) as input and at each denoising step injects this map into the SD latent space. As a result the model can no longer drift away from the assigned geometry, even if the prompt hints at something else.

For portrait generation you can live without ControlNet. The face comes out anatomically correct anyway. For product photography it is different. The object is unique, the form does not generalize, any deviation from the reference is visible. Without ControlNet the network will turn your earring into something that resembles an earring. With ControlNet you get the same earring, but in a new environment, with new light, with a new background.

The key idea is this: ControlNet is a retention tool, not a generation tool. The prompt is still responsible for style, material, and atmosphere. ControlNet is responsible only for keeping the geometry and proportions recognizable.

Installing ControlNet in Automatic1111 in 5 Minutes

In A1111 (relevant for branches 1.10 and newer, tested on fresh 2026 builds) it is installed through the Extensions tab. Go to Available, press Load from, search for sd-webui-controlnet by Mikubill, install. Restart the interface. A ControlNet tab appears under the prompt.

The models do not come with the extension, they have to be pulled separately. For SD 1.5 the base set is at HuggingFace in the lllyasviel/ControlNet-v1-1 repository, files go into models/ControlNet/. For SDXL take versions from lllyasviel/sd_control_collection or xinsir, they are heavier (about 2.5 GB each), but deliver quality unreachable by SD 1.5 on product work.

Minimal package for commercial work: control_v11p_sd15_canny, control_v11f1p_sd15_depth, control_v11p_sd15_lineart, control_v11p_sd15_normalbae, control_v11f1e_sd15_tile, ip-adapter_sd15, reference-only (it is not a model but a preprocessor, built in).

In the extension settings enable Allow other script to control this extension and set Multi-ControlNet to 3 units. That is enough for every product task.

Canny: the Workhorse of Jewelry and Packaging

Canny builds a map using the Canny edge detector. The result is a black and white picture with thin lines wherever there are sharp brightness transitions in the source. The model then holds exactly those lines.

What matters for product work: Canny ignores soft tonal transitions and works only with sharp edges. This is ideal for:

  • jewelry with many facets (diamond, multi-faceted pendants)
  • watches (dial with numerals and hands, bezel scale)
  • packaging with printed text and logos
  • electronics with visible body seams and keys

Parameters that are actually worth changing:

| Parameter | Default | For Jewelry | For Packaging |

|---|---|---|---|

| Control Weight | 1.0 | 0.85 | 1.1 |

| Starting Control Step | 0 | 0 | 0 |

| Ending Control Step | 1.0 | 0.85 | 1.0 |

| Low Threshold | 100 | 50 | 100 |

| High Threshold | 200 | 150 | 200 |

Low thresholds (50/150) catch more fine facets, which is critical for diamonds. High ones (150/250) remove noise on flat packaging surfaces. If you set Ending Step to 0.85, the model will generate the final steps without rigid retention, and the highlights will come out alive, not flat. This is a working trick for all metals.

Typical mistake: putting Weight at 1.5 in the hope that geometry will be even more accurate. In reality, above 1.2 the model starts ignoring the prompt and produces almost the source image, without new light and background.

Depth: 3D Form for Sculptural Objects

Depth outputs a depth map where near areas are bright and far ones are dark. The model holds the volumetric shape but allows fantasy with surface and material.

When to take Depth instead of Canny:

  • bottles, perfume flacons (smooth curves without sharp edges)
  • ceramics, vases, figurines
  • bags and footwear (soft form, folds)
  • furniture and accessories
  • cosmetics in tubes and jars

Preprocessors to choose from: depth_midas (classic, fast), depth_zoe (more accurate on complex forms), depth_anything_v2 (the new 2026 standard, best of all in accuracy).

Parameters:

  • Control Weight 0.7-0.9 (Depth likes soft retention)
  • Ending Step 0.7-0.8 (release earlier so the material texture works freely)
  • Preprocessor Resolution 512 for SD 1.5, 768-1024 for SDXL

The main trick: if the object in the source is too close to the camera, midas gets confused with the background. Before the run, cut the object out of the background (in Photoshop, or via rembg) and feed it on a black field. The depth map will be cleaner, retention more accurate.

Lineart: Thin Lines, Engravings, Patterns

Lineart works like Canny but produces more artistic, smooth lines instead of technical outlines. For product work it is useful in three scenarios:

  1. Engravings on metal (watches, cigarette cases, rings with inscriptions)
  2. Complex patterns on textile, ceramics, wallpaper
  3. Illustrative catalog presentation when artistry is desired

Preprocessors: lineart_realistic (for photorealistic work), lineart_anime (only for stylized catalogs), lineart_coarse (for simplified patterns).

For engravings set Weight 1.0, Ending Step 1.0. No releasing, otherwise fine details will be wiped out.

Lineart and Canny are often confused. A simple rule: Canny catches borders (where dark meets light), Lineart catches lines (where a line was drawn or scratched). For a barcode take Canny. For an engraved inscription take Lineart.

Photo retouching example

Normal Map: Relief and Surface Texture

Normal Map is an RGB map where each channel encodes the direction of the surface normal at a point. Bluish color means a plane facing the camera, reddish and greenish tones indicate tilt.

For product work Normal Map is irreplaceable where texture matters without rigid geometry:

  • leather (bags, footwear, furniture)
  • fabrics with pronounced texture (velvet, wool, linen)
  • casting, embossing, relief emblems
  • 3D logos on packaging

Preprocessors: normal_bae (standard), normal_midas (older version, sometimes works better on monochrome surfaces).

Parameters:

  • Weight 0.6-0.8
  • Ending Step 0.9
  • Preprocessor Resolution 768 minimum

Normal likes when materials are mentioned in the prompt. If you generate a leather bag and write simply bag, the model may ignore the texture. Write grain leather, fine texture, soft matte finish, and Normal will pull out the nuances.

Tile: Upscale with Detail Recovery

Tile is a special ControlNet that does not hold structure in the usual sense but allows you to regenerate missing details during enlargement. It works like this: you supply an image, break it into tiles, and each tile is processed with a Tile hint.

This is a working tool for the final upscale of a catalog image from 1024 to 4096-8192 pixels. Without Tile, upscale blurs details or breeds artifacts. With Tile, real textures of wood, threads, metal appear.

The bundle for product upscale:

  • Tile preprocessor tile_resample, Weight 0.5-0.7
  • Ending Step 1.0
  • Denoising strength 0.4-0.55 (important: not higher, otherwise the model will drift from the source)
  • SD Upscale or Ultimate SD Upscale script
  • Scale 2x per pass, two passes of 2x are better than one of 4x

Tile is often combined with 4x-UltraSharp or ESRGAN models at the preliminary enlargement stage, while ControlNet then adds natural details on top.

IPAdapter: Style Matching by Reference

IPAdapter transfers visual style from one image to another. Not geometry, but specifically style: colors, light, mood, overall look. For a catalog this is gold.

Real case: you shot 200 marketplace products across different days under different light. Through IPAdapter you take one reference image (correctly shot, with brand background and light) and bring all others to its style. The output series looks coherent, as if shot in one sitting.

IPAdapter versions in 2026:

| Version | For What | Feature |

|---|---|---|

| ip-adapter_sd15 | SD 1.5 general | Base, fast |

| ip-adapter-plus_sd15 | SD 1.5 precise | Holds style details better |

| ip-adapter_sdxl | SDXL general | Heavier, higher quality |

| ip-adapter-plus_sdxl_vit-h | SDXL premium | The standard for commercial work |

| ip-adapter_faceid | portraits | Not needed for products |

Parameters:

  • Control Weight 0.5-0.8 (above 1.0 kills the prompt)
  • Ending Step 0.7-0.9
  • Type Style only for style without geometry (important)

Combine IPAdapter with Canny or Depth: the first holds catalog style, the second holds the form of the specific product. You get a series where each item is recognizable and the overall presentation is unified.

Reference-only: Simplified Alternative to IPAdapter

Reference-only appeared earlier than IPAdapter and still lives in the ControlNet extension. The preprocessor does not use a separate model but injects features from the reference directly into the SD self-attention layers.

When to take Reference instead of IPAdapter:

  • quick prototype, without downloading models
  • work on weak hardware (Reference is lighter)
  • the reference and target object are very similar in form

Preprocessors: reference_only, reference_adain, reference_adain+attn. For products, the most stable is reference_only with Style Fidelity 0.5-0.7.

Reference does not like strong prompts. If the prompt contains many stylistic words, it will conflict with the reference. Keep the prompt short, describe only the object itself.

Softedge: Soft Edges for Delicate Tasks

Softedge produces soft, blurred contours. Used where Canny is too rigid and Depth is not structural enough.

Real scenarios:

  • soft toys and blankets
  • bread, pastries, confectionery (objects with irregular organic form)
  • napkins, fabric folds
  • flowers and bouquets

Preprocessors: softedge_pidinet (more contrast), softedge_hed (softer), softedge_pidisafe (more accurate), softedge_hedsafe (the gentlest).

Weight 0.7-0.9, Ending Step 0.85. The softer the object, the lower the Weight.

Combining Two ControlNets at Once

Real commercial product work almost always uses two ControlNet units. One holds structure, the second holds style or an additional dimension.

Canny plus Depth. The base combo for jewelry and packaging. Canny holds facets and text, Depth adds understanding of volume. Weights: Canny 0.9 plus Depth 0.5. Result: proportions and inscriptions do not drift, while highlights and shadows are realistic.

Canny plus IPAdapter. For catalogs with a unified style. Canny holds the form of a specific product, IPAdapter pulls up the overall look. Weights: Canny 1.0 plus IPAdapter 0.6. You take a reference catalog image and replicate its style across all products.

Lineart plus Normal. For details with engravings and texture. Lineart holds the engraving, Normal brings the surrounding metal to life. Weights: Lineart 1.0 plus Normal 0.6. Ideal for premium watches and cigarette cases.

Depth plus Tile. For upscaling complex shapes. Depth keeps the silhouette from falling apart at large scales, Tile suggests details. Weights: Depth 0.5 plus Tile 0.6.

IPAdapter plus Reference. Double style transfer. One sets the color palette, the second the composition. A rare combo, but works for complex catalogs with dual requirements. Weights: both at 0.5.

Three ControlNets at once is almost never needed. On the third unit the model starts ignoring the prompt. If it feels like you need three, most likely the weights of the first two are wrong.

How Much VRAM Each Combo Consumes

Calculations for SD 1.5 at 768x768 resolution and SDXL at 1024x1024. With batch size 1, without xformers or sdp optimizations.

| Combination | SD 1.5 | SDXL |

|---|---|---|

| SD only, no CN | 4 GB | 8 GB |

| 1 ControlNet (Canny/Depth/etc) | 5.5 GB | 10 GB |

| 2 ControlNet | 7 GB | 12.5 GB |

| 2 CN plus IPAdapter | 7.5 GB | 13.5 GB |

| 2 CN plus Tile (upscale) | 8 GB | 15 GB |

| 3 ControlNet | 9 GB | 16+ GB |

With xformers and the medvram flag the numbers can be cut by 25-35 percent. On 8 GB cards SD 1.5 runs fine with two ControlNets, SDXL only with one and necessarily with medvram. On 12 GB SDXL with two CN runs comfortably. On 16 GB and above you can build any combinations without restrictions.

In 2026 the working configuration for commercial product photography is a video card from 12 GB. Anything below requires compromises on resolution or unit count.

Typical Setup Mistakes

Default weights on all units. When two ControlNets sit at Weight 1.0, they fight each other. One pulls the blanket, the other pulls back. The prompt is ignored. Rule: the sum of weights of all ControlNets should not significantly exceed 1.2-1.5.

Wrong preprocessor for the source. You feed an already prepared Canny map as input but leave the preprocessor as canny. The extension tries to process the ready map again, the result is garbage. If you have a prepared map on hand, set the preprocessor to none.

Preprocessor resolution lower than generation resolution. If you generate 1024x1024 but the preprocessor is set to 512, ControlNet works on a coarsened map and loses fine details. Set Preprocessor Resolution equal to or close to the larger side of the canvas.

Ignoring Ending Step. 90 percent of retouchers leave Ending Step at 1.0 and wonder why the highlights come out dead. Drop it to 0.8-0.85, the model will finish the final steps freely and the materials will come alive.

Strong retention on a weak prompt. ControlNet is not magic. If the prompt is described in three words, no ControlNet combo will make it beautiful for you. The structure will hold, but the quality of light, material, and atmosphere depends on the prompt.

Using one combo across the entire catalog. A watch and a blanket are not made with the same setup. Each product type gets its own ControlNet combo. A serious studio keeps a preset library covering 10-15 typical scenarios.

What Comes Next

Mastering ControlNet is the first step toward commercial AI work in product photography. Next come LoRA, ComfyUI pipelines, inpainting, regional prompts. Assembling all this into a working process takes self-taught retouchers 8-14 months, and most give up: forums are closed, videos go stale in 3 months, every mistake costs hours.

If you want to walk this path in 3-4 months with structure, feedback, and real commercial tasks, look at the AI PRO course by gdefoto. Full cycle: from installing SD and ControlNet for product work, through LoRA and IPAdapter, to delivering a marketplace catalog turnkey. Instructors are practicing retouchers, every work is reviewed personally, there is a closed graduate chat and a constantly updated preset library. Details and enrollment for the next cohort: Enroll in AI PRO

ControlNet is a tool, not a goal. The goal is commercial retouching of the new generation, where a series is shot ten times faster and looks better than the classic approach. That is where the market is going.