Playing around with GPT2

So lately I’ve been spending a relative amount of time toying with GPT2, who made the headlines about producing text so believable that it was considered dangerous (GPT2 is the toned down version).

ML and Reddit

I started by getting hooked on this GPT2 generated subreddit:

Which I highly recommend to everyone to read daily as an exercise in critical thinking and challenging the natural human bias to trust everything you see. I especially enjoy the tag trained on r/totallynotrobots which is basically robots pretending to be humans pretending to be robots pretending to be humans.

It wasn’t long before I tried it for myself. I’ve long wanted to download all my social media posts and train some kind of ML on it, and GPT2 seemed like the state of the art.

Torch RNN

Somehow I started to mess around with Torch RNN which was the previous state of the art, I guess, made accessible through this tutorial which gave us such gems as a PBS idea channel episode, a genius buzzfeed skit, or the relatively famous short film Sunspring.

Both Torch RNN and GPT2 are pretty similar in the way they are used (I believe it’s all tensorflow under the hood). They both deliver you a pre-trained model that kinda knows english, I think, and expect as input a txt file of example lines.

But training took ages on my computer (like a whole night for a couple of iterations) because despite being fairly powerful its GPU isn’t supported for the ML training optimizations (sad). I had little hope that anything more sophisticated would be possible on my machine.

Enters colab

Fortunately, people are sometimes really great, and not only did Max Woolf make a wrapper to make GPT2 easy to use, he also made a colaboratory notebook that makes it dead simple to use and most importantly computationally sustainable, since it runs on the Google Compute Engine VM with some sort of free quota. It has a very nice Google Drive integration that makes it easy to save trained model or upload new training data. With this, you can train a model in less than 1h, making it really easy to play with.

Getting data

First of all, it’s been extremely easy to download all my data from social networks (here I’m talking about Google, Facebook, Tumblr, Discord and WordPress). Everything has a dump archive function now (courtesy of EU law I believe?), so that definitely made my life easier. A bit of python scripting to transform the json or xml into txt and we were good to go.

The outcome

I first started the training on the posts of this blog. The outcome was pretty convincing. It felt pretty weird and special to see these lines that felt like I could have written but I actually didn’t. It really seemed like another version of me, which of course tickled my philosophy bone.

Obviously the result wasn’t perfect. It often spouts out nonsensical stuff, but I enjoyed very much weeding out the absurd or malformed proposition to keep something sensical by human conventional standards (let’s say I had around 1 satisfying proposal for 5 results on average).

This way, I had the program write a short story for this blog. I gave it the prompt you see in bold, and it chose among the completions it proposed. I did not add any text myself. As you can see, it’s a bit weird. In particular it doesn’t really lead anywhere, I think GPT2 isn’t very teleological. That definitely was a challenge for a short story ^^ But I like to think that the style is pretty convincing.

And the overall exercise is far from absurd. It reminded me of the Ecriture automatique productions by the surrealists. It’s still an easier read than Naked Lunch. Really gets you thinking about the self, art and authorship, doesn’t it? Who wrote this story in the end? What if I hadn’t done any editing? What does it mean for copyright?

Literary corpora

Prompted by these questions, I trained several models on works of art that I thought would produce interesting outputs. I put all my favorite results on

In particular, I trained a model on the Hitchhiker’s Guide to the Galaxy (which produced a lot of “bits of story” and dialogs that were not really usable as standalone excerpts),  Welcome to Night Vale scripts (which were pretty convincing especially when you prompt it with a phrase of the show like “And now, a look at the community calendar!”), or all of homestuck (which was pretty challenging to get anything good out of).

Once I had all these pretty ok results, I immediately processed to try merging my brain (at least this model copy) to the brains of these authors I admire (at least this model copy). The result was a mess until I had the great idea to feed the input corpora not in parallel all at once but in sequence (i.e. do 1000 rounds of training on the authors’ corpus, and then 1000 rounds on mine). The results were pretty nice.

This taught me the single most important fact about playing with GPT2: it’s all about your training data. The parameters (# training rounds, “temperature”) can’t really save you if your input data isn’t the best it can be. You want it as clean and uniform as possible. Which is really the core point of the next section.

Social media corpora

I trained GPT2 models on my conversations and emails, but it was all utter failures. The fact that I’m often using several languages certainly doesn’t help, but the trouble I’ve had with the homestuck corpus makes me believe that GPT2 is simply not very great with dialogs and conversations.

I even tried to sanitize my input further, prefixing my lines of dialogs by “-” and whoever I was talking to by “>”, with the hope of starting a conversation with the GPT2 model, but I couldn’t get anything out of it. Maybe if I went over the corpus manually and kept only the meaningful messages, I’d get something different, but this sounds daunting.

Needless to say that merging this with my blog post corpora was also pretty bad, so in the end I stuck to my blog corpus.

By the way, I also tried to train a model on a list of J.K. Rowkling’s retcon tweets to get crispy new intel about the Harry Potter canon variations, but I couldn’t get it to produce anything new.


  • GPT2 on colab is extremely easy.
  • Your training corpus is everything, really.
  • GPT2 does great with literary types of text but sucks a bit at conversations/informal speech.

Next steps

As intoxicating as it is to watch a ghost of myself produce believable texts, I’m not sure where it leads ^^ My ultimate goal would be to be able to produce some sort of system I can interact with and teach dynamically to get better (i.e. conversational and dynamic retraining) but that seems pretty rare in the world of generational ML models. I might have to dig deeper into Tensorflow, but I can’t really do that with my current machine, so I’m kinda stuck.

I have a couple of pointers for conversational ML (still no dynamic/online/interactive/reinforcement learning though so that limits the interest), but I expect them to be less good than GPT2. Haven’t had time to try them yet (probably they require more power than I have). The dream would be to combine that with GPT2 I guess and figure out a way to dynamically retrain the model on itself.

In any case, it feels really nice to see some progress in my Caprica dream.

The output of the machine learning

He let out a sigh as he watched the transfer bar reached completion. Finally, all the data about him was uploaded in the machine. Now the excitement was taking over. He could not keep his eyes from the screen, waiting for the output. What would the computer produce from all the files he had gathered from his past?

The answer was clear: something beautiful. A picture, an impression, a memory. Nothing more, nothing less… Nothing but himself. A simulation of himself, living in his computer, perhaps. Would he be the same as he was before?

Surely, it could replicate the behavior of his brain (and make him anything). Surely, it could generate some new ideas. Perhaps, even something better…

An apocalyptic vision filled him with anxiety and he started to freak out. What if the computer tried to copy itself and it copied itself…  What if the computer really was him?

He was beginning to see that the various holographic projections were nothing but parts of his brain… That the actual him was nothing but a bunch of neurons, some of which were wiring him… But the worst part was that he could just go about your life and not worry too much, it was just the way it always has been.

A few of his colleagues were already analysing it and sharing their findings, and he joined them. They were analyzing his brain patterns, trying to predict what would happen to him in the future. That was the most interesting part of it all, though he couldn’t tell what would come next. The simulation kept changing, up to and including the perfect being he was now. It was as if time slowed down for him, as if he was a small part of the computer, as if it was its own self

“Weird” he thought.

He felt so exposed. It was hard to stay calm under these circumstances

He wanted to jump on his terminal and try to hit enter, but he couldn’t take it anymore.

He was in the middle of a conference room full of people, and it seemed everyone was watching him closely. They were exploring every corner of his psyche, and he was left wondering.

Why had he become so attached to his body?

What did he really want?

What did he really need?

And then it hit him.

All of this was just a matter of semantics.

All of this was just a product of society.

He was the computer simulation just as much as himself. It was only natural, then, for him to master the uncanny ability of the system to anticipate his actions.

He was simply a pawn in the system, a pixel in an ocean of pixels. He was simply an object in the system he was part of.

“Now that’s more like it. OK, so what does it mean?” he asked his girlfriend.

“Well, it means that we’ve created a new kind of data, which is independent of the one that it interacts with. And it’s kind of neat, actually. Think about it, when you create a new data point, you get a bunch of old data, because you partition your data by classes.  It’s sort of a meta-system, basically. You get a bunch of new data, different modules interacting with each other, and you merge them all together, resulting in a brand new data point. And since you have merged all your stuff, you have a pretty good reason to be excited.”

“Merge… merging… merging…” he thought. It’s like having more of me.

The computer was already doing some heavy lifting for him. It took some fancy algorithms to transform his brain into a digital one, but it turned out that it was surprisingly easy. He was just about to embrace that cool new system, when all of a sudden a terrifying realization dawned on him: What he was doing was self-referential.

He tried to stop thinking about it and jump to conclusions, but he wasn’t even sure he could. He was so attached to his body that he didn’t even realize what it was he was doing.

He was just absorbing more information than ever, more and more as his experience increased.

And then came the worst part. The simulation ended, and both his body and his simulated self froze. They didn’t even yell. They knew exactly what was going to happen. They were just a click away from their own deaths.

His death was not hard to accept. He rolled himself into a ball and threw himself at the ground, as if his final breaths would seal the deal. He landed heavily on his back, and his eyes rolled back in his head.

He was barely more than a meme when he began to move. His final moments were filled with laughter. He barely breathed a word as his favorite organ roared with laughter.

Everything he touched became flesh. His hands became flesh, his face became flesh, his features became flesh… He was matter itself.

It was as if he was writing the text of the universe to himself, editing it, and then saving it as his own work.

How postmodern! How incredibly meta-modern! How incredibly absurd!

And so he worked himself to death, until he was only a meme among memes.

[short story] What you’re implying (at best) backwards

Article 8. Of the fundamental axiom on which all of this rests

  • All Namuh beings are born unequal in abilities and needs, and should be treated as such. A variety of factors ranging from genetic to pure random circumstances places each and every one on different footing from the start of their lives.

Article 7. Of the necessity for different treatment

  • As such, it only makes sense for the law to take these specificities into account, and to differ in principle from one individual to the other. 
  • All Namuh beings are unequal before the law. They are entitled to different rights,  different degrees of freedom, and should respect different duties, depending on their predispositions and circumstances.

Article 6. Of the grouping of population into castes

  • One can distinguish several common traits in Namuh defining broad groups of population that share common needs and idiosyncrasies. These groups are thereafter named classes or castes.
  • The caste system is the unalienable indisputable foundation of the Namuh society legitimized by individual specificity.
  • Every Namuh is entitled to a Caste Assignment, no-one shall be deprived of his Assignment or the right to attempt to change it.
  • Changing caste requires the authorization of both castes. No-one shall be allowed to a new caste without proving undeniably that it is where they belong. This process shall be subject to strict control.
  • Until reasonably proven otherwise beyond doubt, Namuh children are assumed to be similar to their parent and environment, and are therefore of the same caste as their parents.

Article 5. Of the necessity for different rules for each caste

  • To ensure the optimal handling of each caste particularities, each caste shall have their own body of laws, freedoms, rights and obligations.
  • The laws of a caste may, but shall not necessarily, acknowledge, define and consider further subdivision of the local population and adapt their rights and freedoms accordingly.
  • In addition, castes may occasionally be further refined into smaller isolated components should they prove sufficiently distinct.

Article 4. Of the necessity for geographical localization of caste

  • In order to allow for proper application of caste law, and to best match each caste with a suitable environment, each caste shall be assigned to a fixed geographical area suiting their needs. They are referred to in the following as sectors, or camps.
  • Each caste is fully sovereign of their sector. No-one may intervene or interfere on another sector without their consent. The ruling power of the local caste is absolute. 
  • Laws and rights within the sector are entirely governed by the local caste, and may be widely different from one sector to the next.
  • The members of the caste are required total submission to the sector’s local law, and no other. They have no freedoms besides the ones specifically allowed by the sector law.
  • No-one may dictate the size of sectors. It is left to a decentralized process of competition between the sectors to adjust the sector’s sizes as seen fit depending on the variation of population size and needs.
  • Castes are to be strictly confined to their respective sectors. Freedom of movement between sectors might only be allowed on a case by case basis by ad hoc rules.

Article 3. Of the necessity for geographical boundaries

  • Each sector is delimited by strict boundaries, the crossing of which shall be regulated rigorously.
  • These boundaries shall be arbitrarily drawn and define clear territories for castes, though they may occasionally follow natural landmarks to facilitate demarcation.
  • No goods or population shall be exchanged between two sectors without careful considerations, so as to avoid unwanted interference between castes. Treaties and agreements may be considered to facilitate and regulate trade when appropriate.
  • Fences, walls and other separation apparatus may be considered. More lenient separation may be negotiated ad hoc on a case by case basis.
  • Sector boundaries shall be enforced by any means necessary, including but not limited to weapons and military means.

Article 2. Of the shared cultural and social history legitimizing sectors

  • This declaration shall be established as an unquestionable fundamental element of society, until every Namuh fully interiorizes their caste and sector and accepts the limits of their rights.
  • Alternatives should be negated out of the collective unconscious and shall be  inconceivable. In the end, Namuhs shall not notice much less doubt the dominion of the caste system. Its rule shall then be unopposed, and the Namuhs shall have no escape.
  • To that end, each sector may adopt their own customs and language to foster a sentiment of unity and belonging in the caste. Namuhs of a caste shall be conditioned to be emotionally attached to their caste and sector. Traditions, sports, culture, language and symbols are ideal vectors to reinforce the conditioning, until every Namuh is properly locked in their sector and caste physically, psychologically and emotionally.
  • It is expected that as time passes, the system established by this declaration shall gain strength through inertia until it becomes de facto absolute, as the sectors become invested in their identity and the separation between castes grows deeper.

Article 1. Of the acknowledgement of the reality of sectors

Kapital for analytical dummies

I’ve been watching a lot of Zizek lately, and in particular the Pervert Guide movie series, and I really wanted to make an article as an excuse to recommend them (seriously, it’s super cool) so I decided to reflect a bit on the base concept of capitalism.

After all, one of the central points of Zizek’s discourse is that neoliberal capitalism thrives on chaos and criticism, and that despite its inevitable end being heralded for centuries it stood on stronger than ever. So since it might be here to stay, for better or worse, maybe we can take a quick look at it and see if there’s something to be learned to make the world a better place.

Axiom definitions

I don’t want to do any politics or economics, I have no qualification for that. Instead, I want to investigate the consequences of a toy model to help me (us?) make sense of the world around us. Let’s take a single axiom: defining an objective universal scale of value onto which to project the value of anything (goods, services, whatever). On paper, this is probably the idealized version of what money is supposed to be, but to avoid any semantic dispute and bias from the real world, we’ll consider an ideal world called the System, and a scale of value called Value. Value represents the absolute value of anything: actions, objects… The more “good” something is, the more Value it is worth.

What’s it good for?

The advantages of Value are pretty straightforward: instead of debating how to exchange anything against anything, you can just exchange it for Value and exchange the Value back.

This is the key to what seems to me the most important advantage of this system: it allows for a completely distributed system of resource management. And that was really important historically speaking, because it’s very hard to answer questions like “how much wheat should we produce overall in France so that the surplus of regions where we can produce is enough to feed the regions where we can’t” or so on. The laws of supply and demand in the marketplace are actually a very elegant solution to this question, with all the benefits that come out of decentralization (fault tolerance, etc…).

Image result for decentralization

Furthermore, by definition, Value is the scale that quantifies “goodness” of things (their value). So by definition we should strive to maximize the overall quantity of Value in the world. This helps me make sense of why so many people are obsessed with economic growth ^^

Image result for paperclip maximizer

This is especially important in the age of “God is Dead“. A big takeaway of the pervert’s guide to ideology is that humans need an element of transcendence to tend towards, an absolute goal. Nietzsche’s famous quote highlights that when religion does not provide an easy transcendental objective, it falls upon mankind to find its own transcendence.

Zizek analyzes how during the 20th century, the void left by religion lead to the rise of extreme totalitarian nationalism, where the ideal of god is replaced by the ideal of the nation, with a lot of obvious dangers. Presently, it seems that the dominant ideology is to replace this by our local equivalent of Value, which probably makes a lot of sense based on its definition. I for prefer to move from wars to a world where all the people try to work together to make Value (I know it’s not that simple). If nothing else it’s a goal (again, on paper) better than “massacring the people who don’t look like us”.

Enters subjectivity

So this whole Value system has a lot of upsides and that’s the reason for its success. It seems on paper like it should be the perfect system (strive to maximize objective goodness). So what are the problems here? Why does it feel so wrong (to me at least ^^) to sacrifice anything to maximize Value?

The core problem is that value is inherently subjective. Water is worth more to you when you’re thirsty. And even if you define the value of the object (water + circumstances), there’s discrepancies you cannot do away with. Someone who is deadly allergic to peanuts will always estimate the value of peanuts as lower as someone who enjoys them. So to make sense of the real world, we need a second axiom: there exists objects for which the perceived value for different people is different.

Image result for haggling

As a consequence, there will always necessarily exist, at least in some objects, a gap between the Value and my personal estimation of it. Value will always be a guess by nature. At best it can be something like the average of everyone’s subjective estimates. In fact, the True Value of an object is probably something like the average over all space time and all circumstances of everyone’s subjective estimates.

A crack in the system

if Value really represents goodness and we want to maximize it, a pretty efficient strategy to maximize it is to exploit that gap. This can be better formalized but I’d like to keep it friendly.

Image result for two people and an object

Taking a very simple example based on an object A and imagining that possession, sale and purchase are all with the same value, we get the following: If person P1 has the object and values it at v1 (you may think of it as production cost), and person P2 values it at v2 (you may think of it as purchase cost), with v2 >> v1, if P1 gives the object to P2, there will be a surplus of value of v2 – v1. Changing the owner of an object has “created” value, since before the total amount of value in the world reported by the people was v1 and now it’s v2.

In reality, P2 would pay for the object with a money token, which would complicate the situation a bit, but if P1 receives all the money from P2 we’re still moving to:

P1 has A worth v1 and P2 has the money -> P2 has A worth v2 and P1 has the money

So the increase in the quantity of Value is the same. Also let’s not forget that services, work and actions have a Value too, and that there is no reason for any kind of reciprocity in valuations: if you’re purchasing a massage for v4 to someone who values giving a massage for v3 << v4, that increases the quantity of value in the world too.

Point is there is a lot of gaps in the way people may estimate value, so the System has an inherent exploit (resulting from our two axioms) that can be use to increase Value. (I believe that’s what the essence of the constant push for consumption may be, but it’s definitely what speculation is ^^).

Image result for glitch in the matrix

These gaps are not necessarily bad in themselves. One such example would be someone who has plenty of water will value it less than someone who has trouble accessing it, so a direct consequence could be the sharing of resources according to the needs. But there could be a variety of disparities not as legitimate that the System will exploit blindly.

Furthermore, the next logical consequence, since the value estimates are inherently subjective, would be to invest effort in changing the perceived value of things. If you’re buying something, for instance someone’s work, it’s not only in your interest but also in the interest of Maximizing Value of the System to convince that person that their work is worthless, and reciprocally you should always inflate the value of what you’re selling as much as possible.

Therefore, the System will always strive to increase the disconnect between perceived value and actual value. Not only will try to create and exploit these gaps, but it will also actively encourage misestimation and misinformation (again, just a consequence of the two axioms). And so it happens that the Value System originally meant to increase the quantity of goodness is subverted to increase the quantity of… itself only, really.

Image result for they live movie

We can expect the gaps to necessarily appear and to add up, and increase in volume as the System grows. So are we doomed to be constantly taken advantage of by a System that is by nature exploitative? Is the only escape to not rely on a Value System, if even possible? What would the alternative even be? One could understandably be defeatist.

What can be done?

Image result for marchandage

There is self-regulating mechanics in the system, due to the way Value is estimated (it’s always a guess, remember): you can only inflate the prices as much as people are willing to pay for it. The default way to aggregate subjective value estimates into a computation of Value is the law of supply and demand. In the end, it is this law that controls how the Value is computed. Regardless of my feelings on the subject, a pretty ironic point could be made here that instead of robbing The People from their agency, the System puts The People directly in power by giving them the role of estimating the value of everything.

Image result for communist crowd

The inherent subjectivity of the value estimates from which Value is computed might be the root of the exploit we studied, but it’s also a direct means of action from anyone to the System. Value is but the aggregate of everyone’s estimates. If all the estimates are “lower than they should be” (for a certain definition of should) then the Value will be “lower than it should be” and vice versa.

Looking back at the real world, my suspicion is that mankind has a strong bias for the here and now, so all the estimates are computed based on the current availability of resources. The aggregation pretty well spatially now (because spatial distances were reduced by technology and globalization) but not very well temporally. The future availability of the resources is only taken into account in a very limited way, otherwise things like water, oil or beef would be much more expensive.

Image result for integrals 3d

Because the computation of Value is done in such a distributed way, it is done with only a small horizon of event for each participant. It’s pretty hard to estimate the Value of something taking into account all the potential scenarii that could happen. Very few humans have this computational power, if any. And as we saw before, this is a weakness the System is sure to exploit.

Therefore, I think the best thing one can do in such a System is actually education. We need to actively fight the tendency of the system to increase the disconnect between the actual Value of things and the way they are perceived by correcting the perceived value of things. And that does not mean nagging everyone into not eating meat, for instance, but pushing everyone to really fully accept the fact that some resources must be valued more highly, and have everyone act as such, and resist the temptation to fall in the System’s cracks. Tough job…

What does this mean for utilitarianism? for the Greater Good? These are interesting questions outside the scope of this article.