Current Events > So how do those AI written stories work?

Topic List
Page List: 1
Bad_Mojo
07/09/18 8:55:21 AM
#1:


For example, there are a few Harry Potter ones out there. Right now I'm watching a guy on YouTube read movie reviews written by them -

https://www.youtube.com/watch?v=nd3vWm8E4yk" data-time="


But how do they write the programs? A simple explanation, please, lol

Do they just plug the entire story of Harry Potter into it's AI, and then tell it to make up a story basked on all the characters in the story? Maybe it's just plugging in character names, locations, ect? But then how do the reviews work?

Just curious
---
... Copied to Clipboard!
Bad_Mojo
07/09/18 11:57:22 PM
#2:


... Copied to Clipboard!
DarkTransient
07/09/18 11:59:21 PM
#3:


... Copied to Clipboard!
Bad_Mojo
07/10/18 12:08:21 AM
#4:


DarkTransient posted...
A wizard does it.


Fucking wizards, they do everything
---
... Copied to Clipboard!
IllegalAlien
07/10/18 12:15:49 AM
#5:


There are a few ways to do this. The simplest is to create a "language model", which can be fit on the text corpus.

To simplify a bit; essentially you declare the expressivity of the "n-gram" model, where n is the length of the sequence. For a "unigram" model, we simply count the relative frequency of the word w.r.t. the total number of words in the document (Maximum Likelihood Estimation or MLE). For a "trigram model" we count sequences of three words "I like eggs", etc.

Then each n-gram has an associate probability distribution. You sample the first word, then condition the next word based on the last. In the unigram case each sampling is IID. In the trigram case the probability factors as p(x_3 | x_2, x_1). If the arithmetic doesn't work out (for example the first word doesn't have two previous words), you just condition on NULL or some special symbol, equivalently, you have a unigram, then bigram, then you can start doing trigram.
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
SpinKirby
07/10/18 12:16:22 AM
#6:


... Copied to Clipboard!
IllegalAlien
07/10/18 12:17:09 AM
#7:


A more sophisticated method uses deep learning to model the language. At a high level it's not much different, the same ideas are simply implemented using a Recurrent Neural Network architecture.
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
Bad_Mojo
07/10/18 12:18:19 AM
#8:


IllegalAlien posted...
There are a few ways to do this. The simplest is to create a "language model", which can be fit on the text corpus.

To simplify a bit; essentially you declare the expressivity of the "n-gram" model, where n is the length of the sequence. For a "unigram" model, we simply count the relative frequency of the word w.r.t. the total number of words in the document (Maximum Likelihood Estimation or MLE). For a "trigram model" we count sequences of three words "I like eggs", etc.

Then each n-gram has an associate probability distribution. You sample the first word, then condition the next word based on the last. In the unigram case each sampling is IID. In the trigram case the probability factors as p(x_3 | x_2, x_1). If the arithmetic doesn't work out (for example the first word doesn't have two previous words), you just condition on NULL or some special symbol, equivalently, you have a unigram, then bigram, then you can start doing trigram.


I didn't understand a bit of that, but thank you for taking the time
---
... Copied to Clipboard!
Zeeak4444
07/10/18 12:18:27 AM
#9:


IllegalAlien posted...
There are a few ways to do this. The simplest is to create a "language model", which can be fit on the text corpus.

To simplify a bit; essentially you declare the expressivity of the "n-gram" model, where n is the length of the sequence. For a "unigram" model, we simply count the relative frequency of the word w.r.t. the total number of words in the document (Maximum Likelihood Estimation or MLE). For a "trigram model" we count sequences of three words "I like eggs", etc.

Then each n-gram has an associate probability distribution. You sample the first word, then condition the next word based on the last. In the unigram case each sampling is IID. In the trigram case the probability factors as p(x_3 | x_2, x_1). If the arithmetic doesn't work out (for example the first word doesn't have two previous words), you just condition on NULL or some special symbol, equivalently, you have a unigram, then bigram, then you can start doing trigram.


My brain melted.
---
Typical gameFAQers are "Complainers that always complain about those who complain about real legitimate complaints."-Joker_X
... Copied to Clipboard!
SpinKirby
07/10/18 12:19:56 AM
#10:


IllegalAlien posted...
A more sophisticated method uses deep learning to model the language. At a high level it's not much different, the same ideas are simply implemented using a Recurrent Neural Network architecture.


It's fake until I see an open source.
---
... Copied to Clipboard!
IllegalAlien
07/10/18 12:23:04 AM
#11:


Let's do an example. Say we're fitting a unigram language model on the corpus "I like to like posts from people I like duh".

Then the model is:
p("I") = 2/10
p("like") = 3/10
p(everything else) = 1/10

To generate a sentence sample a single word at a time. This might become: "like like posts duh I like" which has probability .3 * .3 * .1 * .1 * .2 * .3

---

Obviously this is pretty naive, but this is the basic idea behind it all :)
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
Bad_Mojo
07/10/18 12:29:05 AM
#12:


... Copied to Clipboard!
IllegalAlien
07/10/18 12:37:36 AM
#13:


Haha it's not often someone posts about something I'm knowledgeable about on these boards
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
SoundNetwork
07/10/18 12:38:20 AM
#14:


Lots of human help
---
... Copied to Clipboard!
SpinKirby
07/10/18 12:38:35 AM
#15:


IllegalAlien posted...
Let's do an example. Say we're fitting a unigram language model on the corpus "I like to like posts from people I like duh".

Then the model is:
p("I") = 2/10
p("like") = 3/10
p(everything else) = 1/10

To generate a sentence sample a single word at a time. This might become: "like like posts duh I like" which has probability .3 * .3 * .1 * .1 * .2 * .3

---

Obviously this is pretty naive, but this is the basic idea behind it all :)


I know almost nothing about coding, but I've coded enough to know a little above nothing.

And that little nothing isn't sufficient at all, but it's just good enough to plant the seed of doubt.

I doubt these posts. I doubt them until I see the code for myself, which I will not be able to understand.
Your explanation is sound, but I doubt it's what is being used.

It's staged.
---
... Copied to Clipboard!
IllegalAlien
07/10/18 12:41:57 AM
#16:


SpinKirby posted...
IllegalAlien posted...
Let's do an example. Say we're fitting a unigram language model on the corpus "I like to like posts from people I like duh".

Then the model is:
p("I") = 2/10
p("like") = 3/10
p(everything else) = 1/10

To generate a sentence sample a single word at a time. This might become: "like like posts duh I like" which has probability .3 * .3 * .1 * .1 * .2 * .3

---

Obviously this is pretty naive, but this is the basic idea behind it all :)


I know almost nothing about coding, but I've coded enough to know a little above nothing.

And that little nothing isn't sufficient at all, but it's just good enough to plant the seed of doubt.

I doubt these posts. I doubt them until I see the code for myself, which I will not be able to understand.
Your explanation is sound, but I doubt it's what is being used.

It's staged.

Haha some of them might be staged, especially on YouTube or w.e for views/likes. It depends on how large the corpus is, and how many assumptions you make. It is a fact that the current frontier of language modelling is pretty impressive but these are deep neural networks and are relatively advanced ones at that.
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
IllegalAlien
07/10/18 12:44:06 AM
#17:


I would make an educated guess that a tri-gram language model for something like all the Harry Potter books would split out a few good sentences if not cohesive paragraphs. This is w/o deep learning, so it's the old-school passe way to do it.

A fun project might be to write a web scraper/logger for CE, scrape all posts/titles, then create a chatbot alt and see how well it does ;)
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
Bad_Mojo
07/10/18 12:44:10 AM
#18:


... Copied to Clipboard!
IllegalAlien
07/10/18 12:46:41 AM
#19:


Here's a tutorial: https://machinelearningmastery.com/how-to-develop-a-word-level-neural-language-model-in-keras/

He gets some compelling sentences from Plato or something: "preparation for dialectic should be presented to the name of idle spendthrifts of whom the other is the manifold and the unjust and is the best and the other which delighted to be the opening of the soul of the soul and the embroiderer will have to be said at"

If you download Anaconda package for Python you can run this code and it will work.
---
"Never argue with an idiot, they drag you down to their level, then beat you with experience."
... Copied to Clipboard!
SpinKirby
07/10/18 12:46:51 AM
#20:


IllegalAlien posted...
SpinKirby posted...
IllegalAlien posted...
Let's do an example. Say we're fitting a unigram language model on the corpus "I like to like posts from people I like duh".

Then the model is:
p("I") = 2/10
p("like") = 3/10
p(everything else) = 1/10

To generate a sentence sample a single word at a time. This might become: "like like posts duh I like" which has probability .3 * .3 * .1 * .1 * .2 * .3

---

Obviously this is pretty naive, but this is the basic idea behind it all :)


I know almost nothing about coding, but I've coded enough to know a little above nothing.

And that little nothing isn't sufficient at all, but it's just good enough to plant the seed of doubt.

I doubt these posts. I doubt them until I see the code for myself, which I will not be able to understand.
Your explanation is sound, but I doubt it's what is being used.

It's staged.

Haha some of them might be staged, especially on YouTube or w.e for views/likes. It depends on how large the corpus is, and how many assumptions you make. It is a fact that the current frontier of language modelling is pretty impressive but these are deep neural networks and are relatively advanced ones at that.


If we can come to an agreement, I'd say it could easily be AI generated, with a little human help to boost the humor.

There's definitely some sections where the 'humor' is all to bizarre in a convenient sense, rather than bizarre in a purely nonsensical, and non-comedic fashion, if that makes sense.
---
... Copied to Clipboard!
gunplagirl
07/10/18 12:51:17 AM
#21:


They're fake. You'd end up with gibberish like

Scene interior the at Joey sit table at camera focuses door

So basically, you'd get Donald Trump's latest speech about records
---
Pew pew!
... Copied to Clipboard!
Anteaterking
07/10/18 12:58:34 AM
#22:


n-grams like IllegalAlien described is best "modeled" by what your predictive text messaging does. It tends to only look one or two words back so you get these long meandering sentences that only make sense very locally.

You can also use pretty simple neural networks to get text that fits natural speech a bit better, but you have to introduce some error into the model or you're likely to overfit (it'll do really great at mimicking passages exactly, but won't be able to make much original content).

In either case, you need a large corpus to get meaningful results.
---
... Copied to Clipboard!
kirbymuncher
07/10/18 1:07:27 AM
#23:


pretty sure most of the ones you see posted as jokes and stuff are usually faked, it takes a lot of text to really get anything good out and people will often claim they're from stuff that just doesn't have that volume of material
---
THIS IS WHAT I HATE A BOUT EVREY WEBSITE!! THERES SO MUCH PEOPLE READING AND POSTING STUIPED STUFF
... Copied to Clipboard!
Frolex
07/10/18 1:13:56 AM
#24:


The actual answer to the TC's question is that they're written by actual humans who are both too lacking in creativity to come up with their own joke and too unfunny for meme whose humor isn't 100% contingent on making the reader believe a joke was written by program instead of a human throwing together a pile of z A n y nonsequitirs and references
---
... Copied to Clipboard!
gunplagirl
07/10/18 1:16:33 AM
#25:


Frolex posted...
The actual answer to the TC's question is that they're written by actual humans who are both too lacking in creativity to come up with their own joke and too unfunny for meme whose humor isn't 100% contingent on making the reader believe a joke was written by program instead of a human throwing together a pile of z A n y nonsequitirs and references


True. There's a comedian who has made like 4-5 by now.
---
Pew pew!
... Copied to Clipboard!
Topic List
Page List: 1