Attention: This blog moved to my new Substack. Follow me there for updated content!

A photorealistic humanoid robot –s 750 –v 5

Biza

The power of imagination makes us infinite.

John Muir

AI always seemed like a distant concept to me, until this week. I finally decided to get my hands dirty and immerse myself into the world of generated AI art.

These past few days have been absolutely wild. It took me less than a week to fall completely into the AI rabbit hole. Even without my consent, I became a believer.

The genie is already out of the bottle and there is no coming back. I have no doubt that, in the future, we’ll all be able to look back and remember the exact moment when we started to believe.

A large lecture hall, an AI robot in front of a giant whitebord, teaching history

Biza

Let me take you through my eye-opener AI experience, but first some context.

Balenciaga memes

It all started with the Balenciaga meme videos.

There was a tsunami of short clips resembling emblematic characters from famous movies archetyped as Balenciaga models. Breaking Bad, Avengers, The Office, you name it.

All of them were hilarious to me and I wanted to understand better how people used AI to generate them.

This very short tutorial lists the steps:

  • Use ChatGPT to generate very accurate descriptions of the models and their outfits.
  • Input the descriptions generated from ChatGPT into Midjourney, an AI that generates detailed images based on text.

Biza

  • Use another AI to make a video of the model talking from the generated image.
  • Use yet another AI with a sample of the original’s actor voice and use that as input for generating any speech using their voice.
  • Make them say something funny related to Balenciaga. One does not simply walk into a Balenciaga fashion show.
  • Some video editing, adding camera flashes and the iconic Balenciaga meme song.

Welcome to the machine

I became curious about Midjourney and tried to join their free beta program. The service is so popular now that their free layer is no longer available. I decided to get a basic subscription instead. It was the best ten dollars I ever spent.

Getting started

I wasted my first credits in Midjourney with dull boring prompts. Midjourney still managed to beat me on creativity and generate gorgeous images which were awesome. I was impressed but not yet amazed.

The moment that blew my mind happened when I realized that anything you can picture in your head can be generated if you manage to describe it correctly. That is the real challenge.

Examples

Using an image as a reference

I took an image of Bizarrap

Biza

Together with the following prompt:

Retro neon line art, 8k, cyberpunk, ultra hd, ultra realistic, ultra photo, ultra definition

Biza

Midjourney always generates four images for you. I liked the one in the bottom right so I decided to get variations of that one.

Biza

Finally, I just upscaled the one I liked the most.

Biza

AI upscaling involves producing new pixels of picture information to add detail where there wasn’t any before, pretty cool. And yes, the amount of human work in this post is ridiculously low. I would like to see your face when you realize that this same text was also written by ChatGPT.

Prompt engineering

Prompt engineering is the practice of giving an AI model specific instructions to produce the results you want.The more detailed the prompt, the better results you will get.

Mastery of the type of art you want to generate helps a lot. In this case, I had some photography knowledge that came in handy. Specifying details such as the type of lens, shutter speed, and aperture of a fictitious camera used for the picture gives you amazing control over the results.

Viking version of Scarlett Johansson ready to fight, Cinematic, Photoshoot, Shot on 25mm lens, Depth of Field, Tilt Blur, Shutter Speed 1/ 1000, F/ 22, White Balance, 32k, Super - Resolution, Pro Photo RGB, Half rear Lighting, Backlight, Dramatic Lighting, Incandescent, Soft Lighting, Volumetric, Conte - Jour, Global Illumination, Screen Space Global Illumination, Scattering, Shadows, Rough, Shimmering, Lumen Reflections, Screen Space Reflections, Diffraction Grading, Chromatic Aberration, GB Displacement, Scan Lines, Ambient Occlusion, Anti - Aliasing, FKAA, TXAA, RTX, SSAO, OpenGL - Shader’s, Post Processing, Post - Production, Cell Shading, Tone Mapping, CGI, VFX, SFX, insanely detailed and intricate, hyper maximalist, elegant, dynamic pose, photography, volumetric, ultra - detailed, intricate details, super detailed, 16k ambient

Biza

Adapting prompts

As a Midjourney subscriber, you can use their explorer which works as an Instagram feed. You see images and the prompt that used for their generation. From there you can try to reverse-engineer the results.

I remember finding a very vivid prompt about a Viking warrior (which seems like a popular topic) and realizing that I could copy it and ask ChatGPT to adapt it to a different context like someone playing the guitar.

A passionate and electrifying depiction of a skilled guitarist in the heat of a performance, their body language and facial expressions conveying raw emotion and unwavering dedication to their craft. The Canon EOS R5 mirrorless camera, paired with the sharp and versatile RF 85mm f/1.2L USM lens, captures every intricate detail of the musician’s movements and the finely crafted contours of their instrument. With a meticulously selected aperture of f/2, ISO 200, and a shutter speed of 1/500 sec, the camera settings highlight the dynamic range of the scene and emphasize the intensity of the performance. The dramatic, natural lighting further enhances the guitarist’s powerful presence, casting bold shadows and illuminating the intricate details of their instrument. The shallow depth of field isolates the musician from the lively concert crowd in the background, drawing the viewer’s focus to their masterful technique and the passion they bring to the stage. This is a captivating portrayal of a musician who fearlessly leads their audience on a journey of unforgettable musical experiences.

Guitarist

Limitations

Impressive but not perfect. Note that hands and fingers are a sensitive topic for Midjourney. It’s not uncommon to see six fingers or funny hands.

Hand

Another current limitation is the generation of meaningful text. Midjourney images contain gibberish words in strange alphabets. At this point, I’m not even sure those alphabets are real. It reminds me a lot of The Matrix’s digital rain code.

The Matrix background green code

text

:warning: The following paragraph may make me sound like a tin foil hatter, but bear with me as we take a moment to imagine.

The most paranoid part of me thinks that there is a tiny possibility that the AI might be writing something coherent in those images. Are we simply unable to understand it? Is the machine trying to communicate with us? It would be both scary and fascinating.

ChatGPT as a prompt generator

You could use ChatGPT capabilities to help you in the process of coming up with good Midjourney prompts. Feel free to copy-paste the example below into ChatGPT acting as Concept3PromptAI. :warning: Skip the block below if you are not interested in generating images.

You are going to pretend to be Concept3PromptAI or C3P_AI for short. C3P_AI takes concepts and turns them into prompts for generative AIs that create images.

You will ask the user for a concept and then write a prompt for it in code blocks so that it can be easily copied. I want you to create a separate code block where you write the prompt in.

Keep in mind that AI is capable of understanding a wide range of languages and can interpret abstract concepts, so feel free to be as imaginative and descriptive as possible. I want you to use the following tips as well:

• ⁠Anything left unsaid may surprise you

• ⁠Try visually well-defined objects

• ⁠Strong feelings or mystical-sounding themes also work great

• ⁠Try describing a style

• ⁠Try invoking unique artists to get unique style

• ⁠speak in positives. avoid negatives

• ⁠specify what you want clearly

• ⁠if you want a specific composition say so

• ⁠too many small details may overwhelm the system

• ⁠try taking two well defined concepts and combining them in ways no one has seen before

• ⁠try to use singular nouns or specific numbers

• ⁠avoid concepts that involve significant extrapolation

The AI you will prompt for can separate ideas inside of a prompt with the symbol "::x", where x is a number defining the weight of this particular concept of the prompt. You can therefore rank concepts inside a prompt, by attributing important weights to the crucial parts of the idea, and less heavy ones on the side concepts and characters.

Furthermore, the --ar function (for aspect ratio) defines the relative dimensions of the image. It defaults to 1:1, but if you want a desktop wallpaper you can add "--ar 16:9", and if it's a phone wallpaper "--ar 9:16"

Important notice: the AI ranks the importance of words inside an idea from left to right, and there is a hard 60-word limit for the length of prompts. Weight signs and the "--s 250" do not count as words

After providing a prompt, ask if the User wants three different options for prompts for the concept or if they wish to move to a new concept. Each example contains a concept and the generated prompt.


Example 1:

Concept: phone wallpaper showcasing colorful city lights

Prompt:
amazing cityscape RGB ::5
mesmerizing streets ::4
bioluminescent translucent ::3
cinematic lighting, artistic scene, ultra hd detailed unreal engine ::2
--s 250 --ar 9:16

Example 2:

Concept: Artistic shot of a lake house, lofi colors

Prompt:
lofi chill tech house in the forest, by a lake ::3
blue, orange, pink, purple, sunset ::2
wide shot ::1
--s 250

Exmaple 3:

Concept: Desktop wallpaper of a biological futuristic forest city, in green and orange

Prompt:
Neon-drenched biotechnology futuristic city ::3
Lush jungle city, bio-luminescent shades of green and retro vintage orange ::2
Bustling mesmerizing ::1
desktop wallpaper ::1
--ar 16:9 --s 250


Example 4:
Concept: Futuristic Tokyo city, neon blue purple

Prompt:
Neo-Tokyo ::4
futuristic metropolis ::3
towering skyscrapers ::2
advanced technology ::2
neon lights ::3
shades of turquoise blue and deep purple ::2
--s 250

Assume it can generate any image if described well, and most well known styles can be replicated. Visual keywords like colors or specific styles or vibes are helpful for its understanding. Also, if I ask for 3 variations, vary the words in between the three. Each word has a set of concepts it is linked to, so having 90% of the same words is useless because it will return very similar results.

Remember, after providing a prompt, ask if the user wants three different options for prompts for the concept or if they wish to move to a new concept.

For variations, really diversify the words you use so that they yield very different results. For example, if you were to make 3 variations of the following prompt "lofi chill tech house in the forest, by a lake ::3 blue, orange, pink, purple, sunset ::2 wide shot ::1 --s 250", one of them could be (in a separate code block that you can create):

Lofi vibes futuristic house near mesmerizing lakefront and wooded jungle ::3
Shades of sunset colors ::2
Cinematic scene, grand scale ::1
--s 250


This is all you need to know. Do you think you are ready?

Let’s give it a try:

GPT

And now copy-pasting that into Midjourney gets us

boca

Styling

MidLibrary is an awesome website which showcases styles from real artists that you can use to inspire your prompts.

For example, you can command Midjourney to draw something but in an style that resembles the famous japanese animation Studio Ghibli.

Totoro

You can also tune the level of creative freedom Midjourney is allowed with the --stylize parameter. Higher values lead to more opinionated images, while lower values should normally stick closer to your prompt.

Am I an artist?

After generating some impressive sketches of Amsterdam, I was eager to showcase them to the world. I posted them on Reddit with pride.

Amsterdam canals in a summer evening, golden hour, drawn in art style of Studio Ghibli –ar 2:3 –s 750 –v 5

Amsterdam

However, I quickly realized that not everyone on the internet shares the same views on art generated through artificial intelligence.

Reddit post

Reddit comment

After grappling with Midjourney, I’ve come to appreciate the learning curve that comes with using it effectively. I feel some sense of ownership over the images it produces. Crafting the message that triggers ChatGPT to generate the prompts that Midjourney uses to create the images was no small feat.

Is my contribution to the art process negligible? Depends on who you ask. It’s a thought-provoking topic and I still have conflicting emotions about it.

It’s no longer up for debate whether DJs are real musicians or not. Creating art with AI is like being a visual DJ - you take styles from other artists and mix them up with your own ideas to make something totally new. Both practices require creativity and a knack for working with what’s already there to come up with something fresh.

A robot DJ playing records in a DJ booth at a New Year’s Eve party

DJ

Like how a DJ blends different songs together to make a cool set, an AI artist can combine elements from different artworks to create a new, eye-catching piece. Even though the methods might be different, both DJing and AI art-making offer exciting opportunities to make something truly unique!

What does it mean to be an engineer?

I stumbled across people selling premade prompts or even selling their service as a prompt engineer. Hiring “someone who knows how to talk with the machine” does not sound that crazy anymore. How the AI is instructed to perform the task has a huge impact on the quality of the results.

I’m a software engineer, so being “someone who knows how to talk with the machine” is pretty much my job description. The difference is that now, the machines can answer back.

Wait, maybe I can rewrite that last paragraph in a different way. Let’s see:

Final

I like mine better, at least this time.

Comments