Discordで動くStable DiffusionによるAI画像生成BOTを作った話

2023年 11月 13日月曜日

概要

TypeScript + Discord.js + Stable Diffusion で表題のモノをつくりました。
- 最近弊社Slack用に簡単なBOTを作ったので、その勢いでDiscordもイケるやろ！となり今に至ります。

リポジトリ

https://github.com/n1gaur1/generate-image-bot
- 予めDiscord Developer Portalで以下を取得して.envにコピーしておきます。
  - DISCORD_BOT_TOKEN
  - DISCORD_APP_ID
- Discordサーバーから以下を取得して.envにコピーしておきます。
  - DISCORD_GUILD_ID
- imgurから以下を取得して.envにコピーしておきます。
  - IMGUR_CLIENT_ID
  - IMGUR_CLIENT_SECRET
  - IMGUR_ACCESS_TOKEN
  - IMGUR_REFRESH_TOKEN

index.ts

なるべく少量のコードで書くように意識しました。
1. getEnv
  - 環境変数を取得するだけの関数。
2. makeDiscordBot
  - BOTクライアントを生成するだけの関数。
3. readyDiscordBot
  - BOTがログインできたらコンソールに印字するだけの関数。
4. syncSlashCommands
  - スラッシュコマンドを同期するためだけの関数。

import { makeDiscordBot, readyDiscordBot, syncSlashCommands } from './lib/discord';
import { getEnv } from './lib/env';
import { slashCommandHandler } from './handler/slashCommandHandler';

const { discordBotToken, discordAppID, discordGuildID } = getEnv();
const bot = makeDiscordBot();

// BOT準備
readyDiscordBot(bot);

// スラッシュコマンドの同期
syncSlashCommands(bot, discordAppID, discordGuildID, discordBotToken);

// スラッシュコマンド検出
bot.on('interactionCreate', async (interaction) => {
  if (interaction.isCommand()) {
    await slashCommandHandler(interaction);
  }
});

bot.login(discordBotToken);

スラッシュコマンドの定義

Discord.jsには SlashCommandBuilder なるAPIが提供されているため使用します。今回はスラッシュコマンドを2個用意します。
1. /generateimage コマンド
  - Stable Diffusion APIを使用して画像生成するコマンド。
    - 生成した画像は、imgurにアップロードしつつ、スクリプトを実行しているローカルPCにも保存します。
  - 引数に prompt, negativePrompt, width, height, steps を指定できます。
    - その気になればもっと細かくオプションを指定可能です。
2. /regenerateimage コマンド
  - generateimage コマンドを、前回と同じ呪文を引数なしで実行するコマンド。
    - 画像生成はseedを指定しなければ毎回結果が異なるため、呪文をコピーして実行するのが手間だということで実装。

import { SlashCommandBuilder } from 'discord.js';

export const buildSlashCommands = ():
ReturnType<typeof SlashCommandBuilder.prototype.toJSON>[] => {
  return [
    new SlashCommandBuilder()
      .setName('generateimage')
      .setDescription('AIで画像生成します。')
      .addStringOption((option) =>
        option
          .setName("prompt")
          .setDescription("呪文1")
          .setRequired(true)
      )
      .addStringOption((option) =>
        option
          .setName("negative_prompt")
          .setDescription("呪文2")
          .setRequired(true)
      )
      .toJSON(),
    new SlashCommandBuilder()
      .setName('regenetateimage')
      .setDescription('前回と同じプロンプトでAIが画像生成します。')
      .toJSON(),
  ];
};

スラッシュコマンドの登録

スラッシュコマンド定義用関数 buildSlashCommands で定義したコマンドをまとめてDiscordに登録するための関数です。
getEnv で取得した環境変数 clientId, guildId, token が必要です。

export const syncSlashCommands = async (
  bot: Client<boolean>,
  clientId: string,
  guildId: string,
  token: string,
) => {
  try
  {
    const slashCommandBulders = buildSlashCommands();

    const rest = new REST({ version: '9' }).setToken(token);
    console.log('Started refreshing application (/) commands.');

    await rest.put(Routes.applicationGuildCommands(clientId, guildId), {
      body: slashCommandBulders,
    });

    console.log('Successfully reloaded application (/) commands.');
  } catch (error) {
    console.error(error);
  }
};

スラッシュコマンドの実装

根幹の部分です。
1. slashCommandHandler
  - スラッシュコマンドが入力された場合、その内容によって、処理を分岐させる部分です。
2. generateImageHandler
  - 予め、Stable Diffusion Web UIのバッチやシェルに COMMANDLINE_ARGS という項目があるため、テキストエディタで開いて、--apiを追記しておきます。
  - 第1引数 prompt, 第2引数 negativePrompt, 第3引数 width, 第4引数 height, 第5引数 stepsを入力に使用してStable Diffusionで画像生成を行います。
  - 生成した画像は、imgurにアップロードされ、本スクリプトを実行しているPCに保存されます。
  - width, height, stepsを増加してクオリティを上げたり、サイズを大きくしすぎると、生成完了までに掛かる時間は多くなります。
3. getReGenerateImageHandler
  - 引数なしで実行できるgenerateImageHandlerです。
  - generateImageHandlerは、ユーザーIDをキーとして、prompt, negativePrompt, width, height, stepsの前回値を保持して実行します。
    - 前回値がない場合はエラーを返します。

import { CommandInteraction } from 'discord.js';
import sdwebui from 'node-sd-webui';
import fs from 'fs';
import { ImgurClient } from "imgur"
import { getEnv } from '../lib/env';

interface Prompts {
  prompt :string,
  negativePrompt: string,
  width: number,
  height: number,
  steps: number,
};
const promts = new Map<string, Prompts>();
const STABLE_DIFFUSION_URL = 'http://127.0.0.1:7860';

const { imgurClientID, imgurClientSecret } = getEnv();
const imgurClient = new ImgurClient({
  clientId: imgurClientID,
  clientSecret: imgurClientSecret,
});

export const slashCommandHandler = async (interaction: CommandInteraction) => {
  const { commandName } = interaction;

  switch (commandName) {
    case 'generateimage':
      generateImageHandler(interaction);
      break;
    case "regenerateimage":
      getReGenerateImageHandler(interaction);
      break;
    default:
      break;
  }
};

const generateImageHandler = async (interaction: CommandInteraction) => {
  const prompt = interaction.options.get('prompt') ? 
    interaction.options.get('prompt')?.value as string :
    promts.get(interaction.user.id)?.prompt as string
  const negativePrompt = interaction.options.get('negative_prompt')?
    interaction.options.get('negative_prompt')?.value as string :
    promts.get(interaction.user.id)?.negativePrompt as string
  const width = interaction.options.get('width')?
    interaction.options.get('width')?.value as number :
    promts.get(interaction.user.id)?.width as number
  const height = interaction.options.get('height')?
    interaction.options.get('height')?.value as number :
    promts.get(interaction.user.id)?.height as number
  const steps = interaction.options.get('steps')?
    interaction.options.get('steps')?.value as number :
    promts.get(interaction.user.id)?.steps as number

  promts.set(interaction.user.id,{
    prompt: prompt,
    negativePrompt: negativePrompt,
    width: width,
    height: height,
    steps: steps,
  });
  
  const client = sdwebui({ apiUrl: STABLE_DIFFUSION_URL });

  try {
    await interaction.deferReply();
    const response = await client.txt2img({
      prompt: prompt,
      negativePrompt: negativePrompt,
      width: width,
      height: height,
      steps: steps,
      seed: -1,
      batchSize: 1,
      hires: {
        steps: 0,
        denoisingStrength: 0.7,
        upscaler: "Latent",
        upscaleBy: 2,
        resizeWidthTo: 1024,
        resizeHeigthTo: 1024,
      }
    });
    console.log('parameters', response.parameters);

    const info = JSON.parse(response.info);
    console.log('info', info);
    const fileName = `image-${info.job_timestamp}.png`;

    response.images.forEach((image) => {
      fs.writeFileSync(`./out/${fileName}`, image, 'base64');
    });

    const base64data = fs.readFileSync(`./out/${fileName}`, { encoding: "base64" })
    const imgurResponse = await imgurClient.upload({
      image: base64data,
      type: 'base64'
    });
    console.log(imgurResponse.data);

    await interaction.editReply(
      `${interaction.user.displayName}さんが画像生成しました。\n- ${prompt}\n- ${negativePrompt}\n- ${imgurResponse.data.link}`
    );
  } catch (error) {
    await interaction.editReply(
      `ごめんなさい！${interaction.user.displayName}さん！画像生成に失敗したよ🥺`
    );
    console.log(`${error}`);
  }
};

const getReGenerateImageHandler = async (interaction: CommandInteraction) => {
  if (promts.size !== 0) {
    generateImageHandler(interaction);
  } else {
    await interaction.reply(`前回のプロンプトが見つからなかったよ🥺`);
  }
};

いざ実行

勝ちました。

乾燥した感想

Slackと比較するとDiscordは若干制限が多い気がしました。
Stable Diffusionの基本ですが、checkpointと呼ばれるベースに、Loraという追加学習を読み込ませると、追加学習の内容に応じた人物が生成されやすくなったりします。いろいろ試したいですね。

この記事をシェア