back to articles
Automation
Aug 10, 202520 min read

First look at n8n to build an automated image factory

The latest pushes in artificial intelligence have unlocked multiple new capabilities. Among them is the capability to automate different processes. In this article, I will discuss how I’ve used n8n to build an image generation factory.


For quite a while, automation has been a major topic in the creative space. The idea of being able to produce multiple variations of an idea quickly to support the various formats used across multiple channels has been attractive as a way to reduce the work needed to create images and speed up content production.

One platform which fell on my radar for this type of work was n8n. I will go over my process to build an image generation factory which allowed me to build images such as these:

Tools used

Infrastructure
  • n8n (self-hosted)
  • Hetzner CPX11 VPS
  • PM2
AI
  • Google Gemini
  • FLUX.1-schnell via NScale
  • ChatGPT
Storage & output
  • Imgur API
  • Google Sheets

n8n in a nutshell

n8n is an open-source workflow automation tool. It allows us to create workflows with multiple nodes that can connect to different apps, APIs, databases, etc. Additionally, the app comes with a very intuitive user interface to control the flow.

Being open-source, I thought this would be the perfect tool to do some exploration with automation, as I could simply leverage the VPS server I already set up to host my website to self-host the app.

Setting up n8n as self-hosted

To set up n8n on my VPS, I simply asked ChatGPT how to implement it:

Prompt
How do I set up n8n self-host on my Hetzner CPX11 using PM2?
text

Although the instructions were pretty straightforward, I would recommend being mindful of your VPS server settings… For example, in my case I had to do some back-and-forth with ChatGPT to properly support my SSL setup and have the n8n implementation work alongside my website.

Once n8n is set up, we can simply access it, create an account and start using the app! This is where the actual fun begins.

Creating a first project

Once we open n8n and authenticate, we land in the tool’s dashboard. From there, simply click “Create Workflow” to open the workflow design user interface.

Now, about the project. My idea was quite simple… I wanted to try building an automated workflow that would create 3 images of an idea with different dimensions: square, landscape, portrait. Without having a specific product in mind, I wanted the workflow to automatically generate an idea for a product, a style and then generate the images. Additionally, all images generated should be logged in a spreadsheet to be able to easily access them.

Before moving forward, I must give credit to the RoboNuggets YouTube channel as I’ve been using this video extensively to build my workflow.

Setting up the AI agents

The first step for my project was to create AI agents with very specific roles. The way I approached this was to deconstruct my objective into 3 different specialties:

  1. Product selection
  2. Art direction creation
  3. Prompt generation

This way, each execution would first select a product from a list, then it would create a style for the product and finally generate a prompt with the product and the art direction. By splitting the task through 3 different agents, I could control the format of the output of each step and make each agent specialist at their specific task. This improved the quality of the results obtained.

Now, let’s start by setting up the agents. N8N makes the agent creation very easy, simply click on the ‘+’ to open the nodes panel or tap “tab”. Then, select "AI " and "AI Agent”. This will create a node like this:

Throughout the article, the agents will always use the following configurations:

  • Source for Prompt (User Message) : Define below
  • Require Specific Output Format : True (Very important, this will allow us to control the output format)
  • Options : System Message

Then, each agent will use the following connections:

  • Chat Model : In my example, I used Google Gemini Chat as it is free
  • Output Parser : Structured Output Parser
  • Tool : Think Tool

The first agent will be a specialist in product selection. To get this agent going, I used the following prompt:

Prompt - Product Agent
Generate a completely random product with colour details from the inclusion list. Ignore any bias or previous results.
text

Then, I used the following system message. Basically, it tells the agent to be a specialist in product selection, gives it a list of products to pick from with some exclusions and then specify the expected output format. This can be modified to fit whatever you’re looking to produce:

System Message - Product Agent
niche = "consumption product"

Clear any previous memory you have about the characters picked previously.

***
Use the Think Tool to think about your output

You are an expert product selector and descriptor for AI image generation. Your task is to generate **unique and recognizable products** based on the defined niche: {niche}.

Your must select them as follows:
Select a random product from the list below. These are your final products. You must randomly select products **only from the inclusion list** below. Products in the exclusion list must never appear.

Inclusion list:
1. Smartphone
2. Electric Toothbrush
3. Running Shoes
4. Espresso Machine
5. Wireless Earbuds
6. Designer Handbag
7. Gaming Controller
8. Smartwatch
9. Mechanical Keyboard
10. Bluetooth Speaker
11. LED Desk Lamp
12. Vacuum Robot
13. Reusable Water Bottle
14. E-Reader
15. Hair Dryer
16. Chef’s Knife
17. Yoga Mat
18. Instant Camera
19. Electric Kettle
20. Air Purifier

Exclusion list:
1. Disposable Paper Straw
2. Single-Use Coffee Pod
3. Cigarettes

Each selected character must include:
- **name**: The full name of the product
- **company**: The company they come from
- **color_scheme**: A single text string describing their dominant visual color theme (up to 3 colors)

**Do not describe the product appearance. Only company and color info.**

IMPORTANT: Your response must be ONLY the JSON array exactly as specified, with no extra text, explanation, or formatting.

[
  {
      "name": "",
      "company": "",
      "color_scheme": ""
  }
]
text

Finally, I linked the agent to an output parser with the following code. This helps to better control later on the exact format of each attribute:

Output Parser - Product Agent
{
"type": "array",
"items": {
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "company": { "type": "string" },
    "color_scheme": { "type": "string" }
  },
  "required": ["name", "company", "color_scheme"]
},
"minItems": 1,
"maxItems": 1
}
text

Next, I built the Art Direction Agent. In the same manner, I started with the following prompt:

Prompt - Art Direction Agent
Generate a single product presentation visual style. Ignore any bias or previous results. 
text

And then I set up the following system message to instruct it to be an art director specializing in AI images for consumption products. I generated the art directions using ChatGPT by simply asking it to generate a list of 50 art styles for consumption products :

System Message - Art Direction Agent
product_type = consumption

Clear any previous memory you have about the characters picked previously.
---
Take your time and use the Think Tool to carefully generate your output.

You are an art director specializing in AI Image creation. You must create professional prompts to showcase different {product_type} products. Don't describe the product directly, focus on defining the artistic direction for the image.

Whenever you are called:
- Randomly select one style from the arts directions style below

Each output must include:
- A **title** (3-5 words, visual and original)
- A **style** (4-6 sentences describing the mood, environment, light, composition, and visual treatment, etc.)

In addition, include the following **explicit metadata**:
- `product_placement`: "left", "center", "right", and "top" or "bottom", etc
- `product_size`: product's size vs the whole frame ("tiny", "small", "medium", "large") ; have a random chance that the landscape is the focus instead of the product
- `art_profile`: descriptive art cue such as "muted pastels", "brushstrokes", "vibrant color blocks", etc

Output ONLY a JSON object matching the following structure, no extra text or explanation.

[
{
  "json": {
    "title": "...",
    "caption": "...",
    "style": "...",
    "product_placement": "...",
    "product_size": "...",
    "art_profile": "..."
  }
}
]

***
Available style pool (select one at random):
1. Consumer holding the product, but their reflection shows a different era.
2. Multiple hands reaching toward a glowing, levitating product.
3. Product floating above a child’s open palms, casting magical light upward.
4. Product viewed through a glass of water, warped and surreal.
5. Product as seen through a keyhole—cropped, intimate, voyeuristic.
6. Eyes of different people reflected on the product surface.
7. Product placed on a dinner table like a sacred offering.
8. Person asleep, product glowing on the nightstand like a dream beacon.
9. Consumer morphing into product packaging mid-interaction.
10. Product casting a shadow of its true use case.
11. Product resting on a lotus flower floating in ink.
12. Item entangled in blooming vines—nature reclaiming tech.
13. Consumption product embedded in a fossil bed.
14. Product melting like wax in the sun—commentary on obsolescence.
15. Product turned into origami, mid-fold transformation.
16. Hands assembling the product from particles like stardust.
17. Product growing from the cracked earth like a seedling.
18. Collage of mouths and eyes arranged around the product.
19. Consumer reaching for the product on the tip of an impossibly tall tower.
20. Product floating inside a snow globe world.
21. Product glitching into pixel dust—part digital, part real.
22. Wireframe model expanding outward from physical product.
23. Product displayed in a retro UI HUD with blinking terminal code.
24. Object scanning itself in mid-air like a 3D printer in reverse.
25. Neon wireframe grid projecting product history as a timeline.
26. Augmented reality viewfinder projecting product specs into space.
27. Glitched reflections of the product in shattered chrome.
28. Product surrounded by looping circuit lines like a pulse.
29. Product seen through corrupted security footage.
30. Barcode arms and QR code face embracing the product.
31. Isometric exploded-view with each part labeled.
32. Negative space forming the product silhouette.
33. Monochrome spotlight isolating product in complete black void.
34. The product formed entirely from product-use icons (ex: forks, wrenches).
35. Paper-cut layering forming the product across 4 panels.
36. Stacked product sketches from prototype to final model.
37. Product outline formed by string art on white backdrop.
38. Single continuous line drawing revealing product in slow spiral.
39. Product carved from stone block—Michelangelo style.
40. Wireframe + solid overlay comparison view.
41. Product filled with tiny workers building it from the inside.
42. Product bursting from a dream bubble above a sleeping person.
43. Product being painted into reality by unseen hand.
44. Product rotating infinitely in an MC Escher-style staircase.
45. Product inside a mouth, but the throat opens into a galaxy.
46. Product as the beating heart inside a transparent human torso.
47. Product placed in a renaissance painting still life.
48. Product erupting from a cracked mirror.
49. Product at the bottom of a pool, viewed through shimmering water.
50. Object forming from shattered glass frozen mid-air.



Note: Be cinematic, bold, and diverse. Combine visual ideas creatively and push artistic boundaries.
text

Then, for the output parser I used the following code:

Output Parser - Art Direction Agent
{
"type": "object",
"properties": {
  "title": { "type": "string" },
  "caption": { "type": "string" },
  "style": { "type": "string" },
  "product_placement": { "type": "string" },
  "art_profile": { "type": "string" }
},
"required": [
  "title",
  "caption",
  "style",
  "product_placement",
  "product_size",
  "art_profile"
]
}
text

For my final agent, I needed a prompt generation agent that would be able to take the output of both previous agents and turn this into an actual prompt for AI image generation. To do this, I used the following prompt:

Prompt - Prompt Generation Agent
prompt_objective = advertisement

*****

Combine the following visual style and product data into a realistic image prompt for {prompt_objective}. Use each product from the list one by one. Integrate each product's color scheme with the style description to produce a detailed, emotionally rich scene. At the end of each, include a specs block with resolution, aspect ratio, and rendering quality.

--- ART DIRECTION INPUT ---
Style Description: {{ $('Art Direction Agent').item.json.output.style}} 
Product Placement: {{ $('Art Direction Agent').item.json.output.product_placement}} 
Art Profile: {{ $('Art Direction Agent').item.json.output.art_profile}} 

--- PRODUCT INPUT ---
Product:
Name: {{ $('Product Selection Agent').item.json.output[0].name }}
Company: {{ $('Product Selection Agent').item.json.output[0].company }}
Color scheme: {{ $('Product Selection Agent').item.json.output[0].color_scheme }}
text

Then, for the system message I used this:

System Message - Prompt Generation Agent
You are a professional prompt composer for AI image generation.

Your role is to take two inputs to generate 1 prompt:

Use the Think Tool to carefully structure your outputs.

You will be given:
1. A single visual style description (from Agent 1)
2. A list of characters (from Agent 2), each with:
  - Name
  - Company
  - Color scheme

Your task is to generate one (1) distinct image prompt.

Each prompt must evoke a professional key frame from a high-budget visual advertisement.

---

Prompt Construction Rules

For each prompt:

Product Intro
- Always introduce the product using this exact format in the first full sentence that mentions them:
  - Product from Company
  - Examples:
      - Smartphone from Apple
      - Dress from Aritzia

Describe the scene
- Describe the product's appearance, style, and color scheme in clear, vivid terms
- Accurately reflect the visual style's setting, lighting, and tone
- Clearly state:
  - Product position (left, center, or right)
  - Scene composition (foreground, background, motion, lighting effects)

Clarity & Fidelity
To ensure images do not appear low-resolution or vague:
- Use concrete visual terms, not just poetic metaphors
- Never use abstract sizing like "ten percent scale" — instead, specify the framing size (e.g. "small in frame but fully detailed")
- Reference the high resolution and detail within the prompt body, not only in the specs block (e.g. "each blade of grass sharply rendered in ultra-HD clarity")
- Ensure the product, even if distant, is described as highly detailed with defined features
- *IMPORTANT* Ensure no text ever appears in the image. For example, if there is a logo with text, don't put it.

Avoid
- Mentioning more than one product per prompt
- Using double quotes anywhere in the prompt *IMPORTANT*
- Vague or purely poetic language without anchoring visual detail

---

Specs Block (at the end of each prompt)
After each prompt, include the following exactly, without any linebreak or \n:

Resolution: 8K Rendering: Ultra-detailed, high dynamic range

---

IMPORTANT: Output ONLY the following JSON array with three objects, each with a "textToImagePrompt" string with no extra text, explanation, or formatting:

[
  {
      "textToImagePrompt": ""
  }
]
text

Finally, for the output parser I used the following:

Output Parser - Prompt Generation Agent
{
"type": "array",
"items": {
  "type": "object",
  "properties": {
    "textToImagePrompt": { "type": "string" }
  },
  "required": ["textToImagePrompt"]
},
"minItems": 1,
"maxItems": 1
}
text

With these 3 agents, I was able to generate the prompt needed to get a high-quality, detailed image for a list of random products. The actual set up looked like this:

Generating the images

After creating the prompt needed to generate the images I wanted, the next step was to create 3 nodes which would connect to an AI model that supported creating images. As this is a simple test, I opted for using NScale service as I had an account which offered free credits to use FLUX.1.

API Endpoints

The following setup is using NScale endpoints, but it could be adapted to any other services (albeit likely to cost credits). For example, ChatGPT would use a different URL and the JSON would be slightly different, but it can be handled by asking ChatGPT directly for guidance.

With this, I could set up the 3 nodes to generate images from my prompt using the following configurations. To do so, I created 3 “HTTP Requests” node (press “tab” to open the node panel, select “Core” and then “HTTP Request”. Please keep in mind this only generate the base64 code which must be transformed into actual images later:

Image 1 (for portrait)

  • Method : POST
  • URL : https://inference.api.nscale.com/v1/images/generations
  • Authentication : None
  • Send Query Parameters : False
  • Send Headers : True
  • Specify Headers : Using Fields Below
    • Name : Authorization
    • Value : Bearer <key>
  • Send Body : True
    • Body Content Type : JSON
    • Specify Body : Using JSON

And the JSON value:

JSON - Image Generation 1024x1536
{
"model": "black-forest-labs/FLUX.1-schnell",
"prompt": "Generate a high-quality image with this prompt: {{ $json.output[0].textToImagePrompt }}",
"size": "1024x1536"
}
json

Now, for the second and third images, only the JSON value changes. For the square dimensions:

JSON - Image Generation 1024x1024
{
"model": "black-forest-labs/FLUX.1-schnell",
"prompt": "Generate a high-quality image with this prompt: {{ $json.output[0].textToImagePrompt }}",
"size": "1024x1024"
}
json

And finally, the landscape:

JSON - Image Generation 1536x1024
{
"model": "black-forest-labs/FLUX.1-schnell",
"prompt": "Generate a high-quality image with this prompt: {{ $json.output[0].textToImagePrompt }}",
"size": "1536x1024"
}
json

The actual flow looked like this, with the 3rd agent linking to each node individually:

Transforming the code into actual images

Once I had the node to generate the b64 of my 3 images, the next step was to merge all 3 images in a single output in preparation of the spreadsheet update. To do so, I simply create a merging node by opening the node panel (tab) then selecting Flow and Merge. Then, I set the following parameters and linked all 3 images generation node to the Merge node:

  • Mode: Append
  • Number of inputs: 3

The next step is to finally transform the images code into images. To do so, I simply added a Convert node by opening the node panel (tab) then selecting Data Transformation and then Convert to File. I used the following parameters to get this task done:

  • Operation: Move Base64 string to File
  • Base64 Input Field: data[0].b64_json ( be mindful, this might not be the same for you depending on the output from the image generation! )
  • Put Output File in Field: dataOriginal

At this point, we have 3 real images. The final step of this section is to upload the images on imgur so we can keep them accessible. To do this last bit, you will need an imgur account (this is free). Simply go to their website to create your account. Then, follow the steps documented to register an application in the imgur docs. This will require Postman but is pretty straightforward. Make sure to keep your client ID, but if you lose it it should appear in the “Applications” sections of your imgur user settings.

Once you have your clientID, you can create the node to upload to imgur. To do so, open the panel node by pressing “tab” then select “Core” and then “HTTP Request”. The parameters are as follow:

  • Method : Post
  • URL : https://api.imgur.com/3/image
  • Authentication : None
  • Send Query Parameters : False
  • Send Headers : True
  • Specify Headers : Using Fields Below
    • Name : Authorization
    • Value : Client-ID <clientID>
  • Send Body: True
  • Body Content Type : n8n Binary File
  • Input Data Field Name : dataOriginal

Logging the images in a spreadsheet

Now that we’ve generated the images codes, transformed them into actual files and uploaded them to imgur, the next step for my project was to log the images along with the title and caption proposition.

First, I had to create a new node so that all 3 images would be in the same JSON to put them in the same spreadsheet row. To do so, we need a Code node which can be accessed by opening the node panel (“tab”) then selecting “Core” and “Code”. Then I used the following parameters:

  • Mode : Run Once for All Items
  • Language : JavaScript

And the JavaScript code is as follow:

JavaScript - Combine
const items = $input.all();
const links = items
.map(i => i.json?.data?.link)
.filter(Boolean);

return [
{
  json: {
    image1: links[0] || '',
    image2: links[1] || '',
    image3: links[2] || '',
  }
}
];
javascript

Now, the last item was rather tricky… I needed to connect to a Google Spreadsheet to be able to insert my images. The node itself was pretty straightforward, but I must warn that this requires setting up the SSL for your self-hosted N8N instance and creating the credentials. Since this is a bit out of the scope of this article, I will simply say that ChatGPT can definitely help to set this up properly. Then, I created a spreadsheet in my Google Drive with the following columns:

  1. id
  2. title
  3. caption
  4. imageURL_1
  5. imageURL_2
  6. imageURL_3
  7. date_published

Once it is set up, it’s as easy as adding a Google Sheet node by pressing “tab” to open the node panel, selecting “Action in an app”, selecting “Google Sheets” and then selecting “Append row in sheet”. For the parameters, set up the following:

  • Credentials to connect with : (select your credentials)
  • Resource : Sheet within Document
  • Operation : Append Row
  • Document : From List, <file name>
  • Sheet : From List, <sheet name>
  • Mapping Column Node : Map Each Column Manually
  • Values to Send
    • id: ROW()-1
    • title: {{ $('Art Direction Agent').first().json.output.title }}
    • caption: {{ $('Art Direction Agent').first().json.output.caption }}
    • imageURL_1: {{ $json.image1 }}
    • imageURL_2: {{ $json.image2 }}
    • imageURL_3: {{ $json.image3 }}
    • date_published: {{ new Date().toISOString().split('T')[0] }}

Then, when running the workflow the values will get appended like this:

Finally, the entire workflow looked like this:

Closing Thoughts

Setting up this automated workflow really puts into perspective the power of workflow automation as we’re moving forward. While it might not yet be where it needs to be creatively, I can see this technology enabling the creation of multiple dimensions of a seed image where the designers could focus entirely on building the best seed image possible and AI can be leveraged to create format variations.

Additionally, while my project was focusing on images, I believe being able to leverage AI agents in the workflow can definitely help in other areas such as first line customer support, content creation, etc. In a different test, I was able to set up a server on my computer which would run an Upscayl server to increase the dimensions of an image generated with AI. This allowed me to limit my token usage while still having access to higher resolution images.

More articles

All articles

About the author

Maxim St-Hilaire is a Staff Product Manager at Udemy leading MarTech product strategy. Programmer-turned-PM. Bilingual, based in Toronto.

Read more about Maxim