Generativ kunstig intelligens

Théâtre D'opéra Spatial er et billede lavet ved hjælp af generativ kunstig intelligens.
Ovenfor: En image classifier, et eksempel på et kunstigt neuralt netværk trænet med et discriminative model-mål. Nedenfor: En tekst-til-billede-model, et eksempel på et netværk trænet med et generativt-mål.
Stable Diffusion, tekst-input: a photograph of an astronaut riding a horse.
AI-genereteret musik fra Riffusion Inference Server, tekst-input: bossa nova with electric guitar.
Video genereret af Sora med tekst-input: Borneo wildlife on the Kinabatangan River.
En generativ AI agents arkitektur.

Generativ kunstig intelligens (generativ AI, GenAI,[1] eller GAI) er en delmængde af kunstig intelligens, der bruger store generative modeller til at producere tekst, billeder, videoer eller andre former for data.[2][3][4] Disse modeller lærer de underliggende mønstre og strukturer af deres træningsdata og bruger dem til at producere nye data[5][6] baseret på input, som ofte kommer i form af tekst-input.[7][8]

Forbedringer i transformer-baserede dybe neurale netværk, især store sprogmodeller (LLM), muliggjorde et AI-boom af generative AI-systemer i begyndelsen af ​​2020'erne. Disse omfatter chatbots såsom ChatGPT, Copilot, Gemini og LLaMA; tekst-til-billede kunstig intelligens billedgenereringssystemer såsom Stable Diffusion, Midjourney og DALL-E; og tekst-til-video AI-generatorer såsom Sora.[9][10][11][12] Virksomheder som OpenAI, Anthropic, Microsoft, Google og Baidu samt adskillige mindre firmaer har udviklet generative AI-modeller.[7][13][14]

Generativ AI har anvendelser på tværs af en bred vifte af industrier, herunder softwareudvikling, sundhedspleje, finans, underholdning, kundeservice,[15] salg og marketing,[16] kunst, skrivning,[17] mode[18] og produktdesign.[19] Imidlertid er der blevet rejst bekymringer om det potentielle misbrug af generativ kunstig intelligens, såsom cyberkriminalitet, brugen af ​​falske nyheder eller deepfakes til at bedrage eller manipulere mennesker og masseudskiftning af menneskelige job.[20][21] Bekymringer om intellektuel ejendomsret eksisterer også omkring generative modeller, der er trænet i og efterligner ophavsretligt beskyttede kunstværker.[22]

Referencer

  1. ^ Newsom, Gavin; Weber, Shirley N. (5. september 2023). "Executive Order N-12-23" (PDF). Executive Department, State of California. Arkiveret (PDF) fra originalen 21. februar 2024. Hentet 7. september 2023.
  2. ^ Pinaya, Walter H. L.; Graham, Mark S.; Kerfoot, Eric; Tudosiu, Petru-Daniel; Dafflon, Jessica; Fernandez, Virginia; Sanchez, Pedro; Wolleb, Julia; da Costa, Pedro F.; Patel, Ashay (2023). "Generative AI for Medical Imaging: extending the MONAI Framework". arXiv:2307.15208 [eess.IV].
  3. ^ "What is ChatGPT, DALL-E, and generative AI?". McKinsey. Hentet 2024-12-14.
  4. ^ "What is generative AI?". IBM. 22. marts 2024.
  5. ^ Pasick, Adam (2023-03-27). "Artificial Intelligence Glossary: Neural Networks and Other Terms Explained". The New York Times (amerikansk engelsk). ISSN 0362-4331. Arkiveret fra originalen 1. september 2023. Hentet 2023-04-22.
  6. ^ Karpathy, Andrej; Abbeel, Pieter; Brockman, Greg; Chen, Peter; Cheung, Vicki; Duan, Yan; Goodfellow, Ian; Kingma, Durk; Ho, Jonathan; Rein Houthooft; Tim Salimans; John Schulman; Ilya Sutskever; Wojciech Zaremba (2016-06-16). "Generative models". OpenAI. Arkiveret fra originalen 17. november 2023. Hentet 15. marts 2023.
  7. ^ a b Griffith, Erin; Metz, Cade (2023-01-27). "Anthropic Said to Be Closing In on $300 Million in New A.I. Funding". The New York Times. Arkiveret fra originalen 9. december 2023. Hentet 2023-03-14.
  8. ^ Lanxon, Nate; Bass, Dina; Davalos, Jackie (10. marts 2023). "A Cheat Sheet to AI Buzzwords and Their Meanings". Bloomberg News. Arkiveret fra originalen 17. november 2023. Hentet 14. marts 2023.
  9. ^ Metz, Cade (2023-03-14). "OpenAI Plans to Up the Ante in Tech's A.I. Race". The New York Times (amerikansk engelsk). ISSN 0362-4331. Arkiveret fra originalen 31. marts 2023. Hentet 2023-03-31.
  10. ^ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv (20. januar 2022). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
  11. ^ Roose, Kevin (2022-10-21). "A Coming-Out Party for Generative A.I., Silicon Valley's New Craze". The New York Times. Arkiveret fra originalen 15. februar 2023. Hentet 2023-03-14.
  12. ^ Metz, Cade (2024-02-15). "OpenAI Unveils A.I. That Instantly Generates Eye-Popping Videos". The New York Times (amerikansk engelsk). ISSN 0362-4331. Arkiveret fra originalen 15. februar 2024. Hentet 2024-02-16.
  13. ^ "The race of the AI labs heats up". The Economist. 2023-01-30. Arkiveret fra originalen 17. november 2023. Hentet 2023-03-14.
  14. ^ Yang, June; Gokturk, Burak (2023-03-14). "Google Cloud brings generative AI to developers, businesses, and governments". Arkiveret fra originalen 17. november 2023. Hentet 15. marts 2023.
  15. ^ Brynjolfsson, Erik; Li, Danielle; Raymond, Lindsey R. (april 2023), Generative AI at Work (Working Paper), Working Paper Series, doi:10.3386/w31161, arkiveret fra originalen 28. marts 2024, hentet 2024-01-21
  16. ^ "Don't fear an AI-induced jobs apocalypse just yet". The Economist. 2023-03-06. Arkiveret fra originalen 17. november 2023. Hentet 2023-03-14.
  17. ^ Coyle, Jake (2023-09-27). "In Hollywood writers' battle against AI, humans win (for now)". AP News. Associated Press. Arkiveret fra originalen 3. april 2024. Hentet 2024-01-26.
  18. ^ Harreis, H.; Koullias, T.; Roberts, Roger. "Generative AI: Unlocking the future of fashion". Arkiveret fra originalen 17. november 2023. Hentet 14. marts 2023.
  19. ^ "How Generative AI Can Augment Human Creativity". Harvard Business Review. 2023-06-16. ISSN 0017-8012. Arkiveret fra originalen 20. juni 2023. Hentet 2023-06-20.
  20. ^ Hendrix, Justin (16. maj 2023). "Transcript: Senate Judiciary Subcommittee Hearing on Oversight of AI". techpolicy.press. Arkiveret fra originalen 17. november 2023. Hentet 19. maj 2023.
  21. ^ Simon, Felix M.; Altay, Sacha; Mercier, Hugo (2023-10-18). "Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown". Harvard Kennedy School Misinformation Review (amerikansk engelsk). doi:10.37016/mr-2020-127. S2CID 264113883. Arkiveret fra originalen 17. november 2023. Hentet 16. november 2023.
  22. ^ "New AI systems collide with copyright law". BBC News. 2023-08-01. Hentet 2024-09-28.

Medier brugt på denne side

Astronaut Riding a Horse (SD3.5).webp
A synthograph of an astronaut riding a horse created in HuggingFace Space with Stable Diffusion 3.5 Large. Prompt is a photograph of an astronaut riding a horse. This artwork was created with text-to-image (txt2img) process.
AI-generated audio featuring bossa nova music with electric guitar.ogg
Forfatter/Opretter: Benlisquare, Licens: CC BY-SA 4.0

Demonstration of an algorithmically-generated audio track featuring bossa nova music accompanied by electric guitar, created using Riffusion, an open-source fine-tuned derivative of the Stable Diffusion image-generation diffusion model that has been retrained to generate images of audio spectrograms, which can then be converted into audio files.

An audio spectrogram is a visual representation of an audio clip's frequency content, and images of spectrograms can be converted into audio via short-time Fourier transform, using the Griffin-Lim algorithm to approximate phase during audio reconstruction. While the Stable Diffusion AI model is originally intended to generate visual images from a textual prompt, Riffusion has been retrained from Stable Diffusion v1.5 to instead generate spectrogram images from text prompts describing musical motifs, fine-tuned through the use of Nvidia A10G enterprise datacenter GPUs.

Procedure/Methodology

The spectrograms were generated using the Riffusion Inference Server running the riffusion-model-v1 diffusion model, paired with the Riffusion App UI frontend. The following values were used:

  • Prompt: "bossa nova with electric guitar"
  • Seed Image: OG Beat
  • Denoising: 0.75

This resulted in the output spectrogram image:

Spectrogram image

Spectrograms were then converted to WAV audio using this python script:

Audio converted from spectrogram
Riffusion generates 512×512 resolution images which each represent 5 second chunks of looping audio; for the convenience of the reader, the three generated spectrogram images have been merged together in GIMP along the x-axis (which represents time), and the audio files have been merged together in Audacity and then converted to OGG Vorbis.
GenAI Agent.png
Forfatter/Opretter: Marxav, Licens: CC BY-SA 4.0
Architecture of a generative AI agent that uses a Large Language Model (LLM) and additional optional modules (data, tools, other models).
Discriminative vs Generative Neural Networks.png
Forfatter/Opretter: Lwneal, Licens: CC0
Above: Schematic example of a discriminative neural network performing image recognition. Below: Example of a generative neural network performing text-to-image generation
Borneo wildlife on the Kinabatangan River.webm
A video generated from a text prompt using OpenAI's Sora. The prompt is as follows: "Borneo wildlife on the Kinabatangan River"