Learning Programming

April 13, 2018 by Richard DW Redcroft

Learning to program

If you've read my blog you might find it odd that i don't know how to program. Well its always something i've wanted to be able to do, but just never go into. There's usually always a bit of software that already exists to do what i'd like it to do. If i ever got spare time to mess around i spend that time inside Maya, rather than trying to code. But recently since switching to Houdini i've really been enjoying VEX and the possibilities it brings to my visual FX workflow - ANYTHING is possible now. Vex is an expression language loosely based upon c, with concepts taken from c++ and the Renderman shading language. While i am still learning how to get the most out of vex, i've not found it a particularly daunting task and it didn't take me long to start putting my own things together. Now given the chance of doing something visually (VOPS) or via VEX i choose vex every time. Doing vex has led me to trying my hand at programming in python, the reason for that is two fold. It's used widely in the VFX industry (its built into Maya, Houdini, nuke etc). Secondly its also quite ubiquitous in Linux, with bindings for both QT and Gk. While not as low level as C, it should allow me to make any type of application/program i want. Once i've got to grips with Python more i plan to then learn a lower level language such as C. Then i won't be beholden to anyone!!! In all seriousness being able to program C should allow me to then write shaders using RSL/OSL both of which are C-like. But lets crawl before we run.

Basic python

I started out making simple python applications, starting with everyone's first program, Hello World! (The following is a loose python progression, if you wish to learn python don't listen to me, i have no idea what I'm doing)

import sys

print("Hello World!\n")

I know, amazing! such an incredibly useful application. My life is now complete.

OK lets kick things up a notch!

import sys


def hello_world(count):
    str_hello = "Hello World!"
    for x in range(0, count):
        print(str_hello)
    print("\n")


if __name__ == "__main__":
    count = 5
    i = 0

    while(i < count):
        hello_world(i)
        i += 1

See, now were cooking on gas! Ok in all seriousness a program that prints this:

~/python $ python3 hello.py 


Hello World!


Hello World!
Hello World!


Hello World!
Hello World!
Hello World!


Hello World!
Hello World!
Hello World!
Hello World!

Isn't all that useful, but we've covered the basics of creating a function which can take an input and do something, which can be re-used multiple times. Now lets kick things up even further and use the requests module!

import sys
import requests
from bs4 import BeautifulSoup


def search(search_term):
    url = "https://www.google.co.uk/search"
    params = {"q": search_term, "num": 3}

    r = requests.get(url, params=params)

    results = []
    if (r.status_code == 200):
        soup = BeautifulSoup(r.text, "html.parser")
        for items in soup.find_all('h3', {"class": "r"}):
            url = items.find('a')['href'].split("q=")[1].split("&sa")[0]
            title = items.find('a').text.strip()
            output = ("%s : %s" % (title, url))
            results.append(output)

    return(results)


if __name__ == "__main__":
    void = search("voidlinux")

    for item in void:
        print(item)

which results in the following:

~/python $ python3 search.py 
Void Linux : https://www.voidlinux.eu/
Void Linux - Wikipedia : https://en.wikipedia.org/wiki/Void_Linux
Why Void Linux? - Troubleshooters.Com : http://www.troubleshooters.com/linux/void/whyvoid.htm

That's a big jump from the previous examples, but it's actually quite simple. First we have to import two modules Requests and Beautifulsoup. Requests allows us to send and receive http requests (aptly named), while Beautifulsoup is an HTML and XML parser.

Now lets look at the search function

def search(search_term):
    url = "https://www.google.co.uk/search"
    params = {"q": search_term, "num": 3}

The search function takes in input variable called 'search_term', and we setup some initial static vars called url and params. The url is the base url google uses for search, with which parameters determine what is actually searched and the results it gives. We setup an dictionary called params, which has just 2 parameters at the moment, q and num. q is googles shorthand for query, and num is the number of results to return.

    r = requests.get(url, params=params)

This is the magic, this single line sends an http request to the url, providing the params variable we setup as its parameters. This returns to a variable called r. r stores many attributes, such as the encoding, the content, the status code etc. We are going to determine if we got a successful reply, looking at the r.status_code. This is the http response code we got from google. For a list of codes and their meaning check Wikipedia. A successful response should be code 200, so lets filter out anything that isn't that using an if loop.

    results = []
    if (r.status_code == 200):
        soup = BeautifulSoup(r.text, "html.parser")
        for items in soup.find_all('h3', {"class": "r"}):
            url = items.find('a')['href'].split("q=")[1].split("&sa")[0]
            title = items.find('a').text.strip()
            output = ("%s : %s" % (title, url))
            results.append(output)

Here we initialize a list (although we could have created a dictionary as we are returning connecting parts) to which we will store our results. We have to create it outside of our loop or we will not be able to add to it with each iteration, it would be overwritten. Now we've initialized our loop we use Beautifulsoup to parse the text using the html.parser and store it into a new variable. Then we search for all instances of

that the search returned. For each of those instances we get the url and the title. Both are stored in the element, with the link being the content and the title being the text. The default results looks like:

/url?q=https://www.voidlinux.eu/&sa=U&ved=0ahUKEwi95MDw4LfaAhUZM8AKHRKHDuQQFggWMAA&usg=AOvVaw2Ak3Krcmi6wgBiG9h665lg

I think google adds this additional crap around our links as when a user clicks a result google can log that the user has followed it. We just want to return the actual URL, so we strip the beginning piece and the end piece. We do this with the .strip() method. This splits our string based on the input. It splits it at every instance of the input as an array.

print(items.find('a')['href'].split("q=")

['/url?', 'https://www.voidlinux.eu/&sa=U&ved=0ahUKEwi6h8_V6LfaAhXhKcAKHbsiC6AQFggWMAA&usg=AOvVaw3o0cFbLV_LNxkq77L77RFU']

As you can see we have two pieces, everything before 'q=' and everything after. We want the second piece so we tell it to return that [1] (items start at 0). Then we do the same thing, this time splitting at '&sa', this time returning the first piece [0]

Then to get the title

print(items.find('a'))
<a href="/url?q=https://www.voidlinux.eu/&amp;sa=U&amp;ved=0ahUKEwid9KOc6bfaAhXJAMAKHZ5nDx8QFggWMAA&amp;usg=AOvVaw0XyFtGhO-eLx8eaWeBnhhF"><b>Void Linux</b></a>
print(items.find('a').text)
Void Linux

We add .strip() without any argument to remove any possible leading and trailing characters. e.g.

a = '    A    '
print(a)
    A    
print(a.strip())
A

Then we format our url and title, append it to our results array and return it

    output = ("%s : %s" % (title, url))
    results.append(output)

return(results)

We then call this function inside our main loop and return the results into a new variable called void, and then we create a for loop. This loop goes over every item in our void array, and prints that piece.

if __name__ == "__main__":
    void = search("voidlinux")

    for item in void:
        print(item)

Copyright © 2017 - Richard Redcroft