In the old days, AI was created by humans teaching computers how to do things such as play chess, e.g., taught by humans that a pawn is worth 1 point vs. a queen worth 9, and given a hand-assembled database of opening positions, etc. After that, they just took advantage of the ability of computers to test millions of combinations. There was little in these systems that people couldn't do or understand in principle, because they taught it in the first place.
The next phase (e.g. in chess) was teaching strictly only the rules, e.g. only how the pieces move, and nothing at all about how to play well, but then allowing them to learn how to play chess well by playing millions of games with itself.
Similarly, the latest "generative" AI isn't explicitly taught anything by people. It's just shown millions of documents and it learns on its own that, for example, the word "Merry" is frequently followed by "Christmas," and so on. So then when it's given the prompt "please write a poem about Christmas," it just starts to add one word after another based on the statistical strength of connections between words. To the shock, delight and horror of people, if you allow it to form enough connections between words (we're talking billions), its statistics-based production starts to make rather good sense!
Generative artwork is a little different because it's 2D vs. linear like text. Very roughly, it works like this: You show a computer millions of cat photos scraped from the internet, and (via magic neural net stuff) the computer learns what a cat looks like. Then you give a computer just an image of pure noise as a starting position, but you tell it "that's a picture of a cat, now please get rid of the noise." So it starts incrementally deciding "this little group of random noisy pixels looks just a tiny bit like it could be an eye, and if so, those over there could be the other eye…" So it makes a small refinement that steps the noise just a tiny bit towards a cat. After dozens or hundreds of steps, an original picture of a cat emerges from pure noise.
Now everyone is wondering "so computers can think now? No! Surely it's just following one word after another not from knowledge but from statistics! That's not thinking!" But the deeper question is "isn't that what we are all doing too, at least most of the time!?" Words don't have meanings in isolation. The information is in the connections between words, and now both computers and us know those connections. That's why I don't really have to stop and think about every individual word as I'm typing. They flow as they do because I know automatically from past experience reading thousands of books and documents that this next word follows all those that I wrote before, just as computers now do!
This is going to be a wild ride!