Why? Because you could make a better one. For free. While weird flexing your coding muscles. So win-win-win.
Still don’t believe me? Ok, let’s think about it. What does Pastebin.com actually do?
- it’s essentially a web application where you can paste all sorts of text and they store it and give you a permanent web link to it. People usually share those;
- it’s also an API, allowing people to make other applications capable of using the site, programmatically; to be able to create ‘pasties’ from their own apps and computer programs
See? Not that big of a deal. But it was a _great_ idea when they first came up with it. Just like dropbox, for example, which is the same thing. For files. Apple had that name for one of their directories first. But Drew Houston made a webapp out of that idea. Also a multi billion dollar company, but I digress..
I mean sure, you could go into the nitty gritty details of it all, like what other features does it have. But we’re not here to sell pro subscriptions to Pastebin. No affiliation. Not selling anything but fun in good faith and some coding practice.
We’re here to learn how to write the code for something that does, at a basic level, the same thing. Minus all the bells and whistles that the official site has. They got an army of coders, and there’s only one of you. Or me. So let’s get going…
Table of contents
Programming Language to write it in
Basic concepts and features we’ll implement
Setting up our environment
The logic and code that goes with it
Putting it all together: publishing the webapp and open sourcing its code
1. The programming language of choice.
Let’s go with PHP. For … reasons. It’s just as easy in Python, Ruby, Node, Go, .Net, whatever.
I already have PHP on this box, and MySQL as a database, so I’m not going to go into too much detail on how to set those up.
2. Basic concepts and features we’re implementing
- text pasting into a big field on a webpage. There’s <textarea> for that, no? Then save/store that in a more permanent place, like a file, or a database
- a generic view-item webpage; it gets fed a number of some sort, like a unique identifier for each of these ‘pasties’, in the browser’s address bar; it then retrieves the relevant saved information and displays it
- (bonus) maybe a ‘discovery’ feature. Like listing all the saved pastes. From all users
- (bonus) maybe even the possibility of making any of these pastes expire. Like a 5 minute thing to begin with, or maybe several expiry timers.
Alright, enough features, or the scope-creep will prevent us from finishing anything. This is just an Minimum Viable Product. No bells. Nor whistles.
3. Setting up our environment
My box runs a Linux flavor, so all the examples are easiest to replicate on any Linux flavor. Or on any Mac. Can’t… or rather won’t talk Windows-land. But I’m pretty sure it’s just as trivial to set up.
Due to the fact that this ‘project’ is part of a bigger, 100 projects exercise I’m doing, this particular endeavor is #6 for me. As such, the path I’m working in is:
<webroot>/6/
Here’s the directory listing that I’m working with:
. .. .config.inc .config.inc.example README.md cleanup.log cleanup.php index.php paste.php pasteviewer.php
4. The Logic and The Code that makes this work
4.1 .config.inc
I’ve included a ‘hidden’ example file with the .example extension, so let’s copy that to the same name, without that extension:
cp .config.inc.exampe .config.inc
Next, edit it and fill in the right details in there:
<?php
$mysql_schema="<your database/schema here>";
$mysql_table="<your table under the above schema in the db>";
$mysql_user="<your db user>";//you should grant select, insert, delete on this user privileges
$mysql_password="<your db password for that user>";
?>
4.2 index.php
aside from a load of HTML fluff in there, whose purpose is to make this sub-project kinda look and feel like the rest of them, there’s not much going on. There’s a basic html form, with a few fields, a destination to post data to when the form button is clicked.
<form action="paste.php" method=POST> <table> <tr> <td>Paste<br>(max 1024 chars)</td> <td><textarea name="pasteContent" id="pasteContent" rows="8" cols="128" placeholder="Paste your stuff here:" maxlength="1024" wrap="hard"></textarea></td> <td><table> <tr><td>latest pastes</td><td>[age]</td></tr>
Note the paste.php reference in the form action first line. We’ll see later what that does.
And then, in this same file, there’s the actual PHP code that handles the functionality of this ‘index’ page: (Yes, you can combine HTML and PHP in the same file, so long as it has the .php
extension, the interpreter will also correctly make use of whatever HTML tags are in there, by leaving them alone, basically, so in the end, the whole thing becomes HTML actually)
- discovery/listing of pastes posted in the last 5 minutes (because all expire in 5 minutes, for the sanity of my machine, so it won’t get filled with nonsense that never gets cleaned)
- the form where you paste stuff
- submit button
And it looks like this:
Let’s look at that ‘discovery’ bit of code. Essentially, it percolates through the database, and pulls up the last saved 5 items in there:
<?php include ".config.inc"; $mysqli = new mysqli('127.0.0.1', "$mysql_user", "$mysql_password", "$mysql_schema"); if ($mysqli->connect_errno) { echo "Error: " . $mysqli->connect_error . "\n"; exit; } $result = $mysqli->query("select id, date_created as age, shortlink from $mysql_table order by id desc limit 5;"); if ($result->num_rows > 0) { // output data of each row while($row = $result->fetch_assoc()) { echo "<tr><td><a href=\"pasteviewer.php?id=".$row["shortlink"]."\">".$row["shortlink"]."</a></td>"; echo "<td>". $row["age"] . "</td></tr>"; } } else { echo "0 results"; } $mysqli->close(); ?>
Now, there are some glaring big-time no-no’s in the way this code was poorly written, but it does work. I’m not going to go into a lot of detail on why it’s filled with bad practices, because its purpose is to keep it simple for people to read easily, not to be compliant with the industry best practices at this time.
So, it’s simple:
- connect to the database server, as as it happens, it’s on the same box, but it can be a different one, and if it can’t, outputs the actual error (now, don’t do this in ‘production’, you wanna omit explicit error messages, for security, if not for other reasons as well)
- execute the query that basically says ‘give me the columns id, date_created, age, and shortlink from the relevant table, sort the results in descendant order and by the way, I only want 5 of them’. Because the results are sorted from the newest to the oldest, it’ll return the newest up to 5 results. We’ll see later that when visiting my working example, most times there will be 0 results. That’s by design. I’m wiping the table and cleaning up whatever is being written there, every 5 minutes also, to not reach a disk full situation.
- iterate over the result set in a
while
loop, and with the help of some spurious HTML tags, organize them in a table of sorts, to kinda line up (again, don’t do this in production, in the year 2018 or above, there are better ways to align content than HTML table tags, but that’s for another time); also make links out of the text, ’cause we’re fancy, and want to give the user a nice experience, in that they’ll be able to click on whatever paste IDs are on this ‘homepage’ and it’ll open the content of said pastie into a new browser tab. Cool, huh? - if there are no results, then output that in plain English, and finally close the database connection, so that _that_ particular resource doesn’t get squandered unnecessarily.
Note that there’s a reference to pasteviewer.php in the line where it ‘echoes’ on the page each row, as a HTML link. With a parameter,
?id=
then the value of shortlink, whatever that may be as stored in the database.
4.3 paste.php
Remember that <form action=paste.php...>
shenanigans in the index.php file? Well, since this file is the destination piece of code where that form sends the data filled in the big text field when the Submit button is clicked, the logic is pretty simple:
- connect to the database
- whatever text/content was in the big field, get that and sanitize it a little bit for both security and .. sanity; you know, empty field when clicking submit, or, nasty code in there that we need to make ‘uninterpretable’ but without changing its form in any way, etc.
- grab the exact time, so we have a timestamp of when a user submitted a pasted text to us
- generate a quasi-unique ‘serial’ number for this entry, to associate it with, so we can retrieve it based on this. This same almost unique identifier will become part of the url where the contents of the pastie are always (within 5 minutes, remember?) retrievable. We call that a
permalink
. Permanent for 5 minutes, until the wiping mechanism kicks in, and Kansas is going bye-bye - stick all of this in a preconfigured database, with the right fields, that will accept the input
Anyway, talk is cheap, here’s the simplest, most noob-ish code:
<?php $mysqli = new mysqli('127.0.0.1', "$mysql_user", "$mysql_password", "$mysql_schema"); if ($mysqli->connect_errno) { echo "Error: " . $mysqli->connect_error . "\n"; exit; } $paste = $_REQUEST['pasteContent']; if (empty($paste)) { echo "You supplied no text in the paste field.<br>"; echo "Please go <a href=\"./index.php\">BACK</a> and type or paste something in the field and hit Submit"; exit; } $paste = htmlspecialchars($mysqli->real_escape_string($paste)); $rand = substr(md5(microtime()),rand(0,26),5); if ($mysqli->query("INSERT into $mysql_table (paste, shortlink) VALUES ('$paste','$rand')")) { printf("%d Row inserted.</br>", $mysqli->affected_rows); } echo "<hr><strong>". $paste ."</strong><hr>" ; echo "Permalink is <a href=\"https://100.evervee.me/6/pasteviewer.php?id=$rand\">https://100.evervee.me/6/pasteviewer.php?id=$rand</a>"; echo "<br><a href=\"./index.php\">BACK</a>"; $mysqli->close(); ?>
See, I’m proudly adhering to the KISS rule… You know, Keep It Simple, Stupid!
Not to mention super insecure and all, but whatever, if it helps even one person, I’m happy 🙂
4.4 pasteviewer.php
The purpose of this is two-fold:
a) it’s a standalone page that when called with parameters, either via the browser URL like ?id=<whatever>
it will pull up any relevant data for that ID, if it exists, of course
b) it’s a page that is expected to be called/linked-to from other pages, and not via URL manipulation, and works exactly the same way as in the first example just above.
Simple logic in here as well, this ain’t NASA:
- make sure to note that pesky parameter, which is the ID of a pastie
- connect to the database if you can and ask it to pull up the relevant text associated with that ID and display it with no formatting whatsoever in a blank webpage; just the content. Nothing more, nothing less. If there are no results, then display that, in plain English
Don’t trust me? Alright, here’s the code for this:
<?php include ".config.inc"; $shortlink = $_REQUEST['id']; $mysqli = new mysqli('127.0.0.1', "$mysql_user", "$mysql_password", "$mysql_schema"); if ($mysqli->connect_errno) { echo "Error: " . $mysqli->connect_error . "\n"; exit; } $result = $mysqli->query("select * from $mysql_table where shortlink='$shortlink'"); if ($result->num_rows > 0) { // output data of each row while($row = $result->fetch_assoc()) { echo $row["paste"]; } } else { echo "0 results"; } $mysqli->close(); ?>
(bonus) 4.5 how the cleanup cronjob works
The cronjob to cleanup the db looks something like this:
* * * * * mysql -e "use <schema>; delete from <table> where date_created < (now() - interval 5 minute);"
Every minute it looks at everything that’s gotten saved between now and 5 minutes ago, and nukes it outta orbit. No backups, no nothing. #YoloProgramming
But like I said, I do that because of necessity, since it’s just a learning endeavor. And not an actual website that keeps data forever.
5. Putting it all together, and seeing it in action
The thing ‘lives’ on this same very website, but in a different, standalone, sandboxed section of it, like a subdomain. Called 100. Like 300, but for ??, not Sparta 🙂
So either you click that link that gets created after you submit your first pastie, or compose the url like below, the result is the same – it displays the content that got submitted.
This is to show that stripping those html tags helps a lot in NOT getting owned via either sql injection or cross site scripting. So I’m sure it’s still vulnerable to loads of other stuff though, we shall see if folks got time to nuke it since it sort of resets every 5 minutes…
Code is on github somewhere, but mostly privately on bitbucket, the most updated one with changed credentials.