How to Get Kids’ Hands Dirty With Data? (Part 2)

My last post is about places students can get publicly available data. I promised to talk about the tools we can use to access it. This post will be about with the tools that are out there now.

Probably the gold standard right now for getting at real data and doing stuff with it is the Wolfram Language. There is a lot to recommend this, but the biggest one is the enormous amount of data the WL puts at your fingertips. Also the wide amount of data types the language can access and work with, including things like colors, countries, shapes, images and all kinds of other things I probably haven’t tried yet. WL is entirely comfortable with an array that looks like this:

Wolfram list

Wolfram language also allows a fantastic range of visual outputs, from 2d and 3d graphs, colored maps, graphs of functions with sliders you can manipulate, interfaces with buttons and probably a bunch of other things I haven’t tried yet. Furthermore it’s extremely easy to use, so almost anyone can look through the documentation and start typing things in to see what happens. And it connects to a lot of other APIs, like Facebook’s.

But some of these strengths are also weaknesses when your goal is to teach coding. Wolfram’s radical leniency with data types will teach your students terrible coding habits that will lead to real problems in other languages. The wide range of different kinds of output the language is capable of makes it really hard to predict what kind of output you’re going to get from a program. And though I haven’t tried it, Wolfram doesn’t seem to be set up for long complex functions. There’s not even a local development environment where a person can edit and save their programs. As a result, I think that WL is a terrific tool for teaching pretty much any subject except for programming. The one exception is if you’re preparing students to learn a functional language like Haskell, since WL is also purely functional, calling functions in a similar way and lacking things that most people expect to see in a programming language, like loops.

The two most popular tools for data manipulation in the scientific world seem to me to be Python and R. Python is a great learning language because it’s really easy to do a lot of things and there are countless modules that can do just about anything. But in spite of its ease relative to other programming languages, it still takes awhile before a student can do much. You have to spend about a semester’s worth of writing console programs before you will be ready to do any kind of graphics, and that’s enough time to lose a student who isn’t naturally attracted to coding. But a well-designed class could have students doing interesting things with data pretty soon.

I don’t know much about R. I know that professional statisticians really like it and professional programmers mostly don’t. I don’t think it has a shallow learning curve. I imagine if you wanted to take the time to teach it you could have students doing some interesting things but from what I know it’s a better tool for teaching statistics than for teaching programming.

So what’s the right tool? I think the ideal tool doesn’t exist…yet. But I have a vision in my head of what it would look like. It would be as accessible as something like Greenfoot or NetBeans, with a low scaffold for beginners. It would allow you to manipulate data as easily as a Python module, though maybe not as easily as WL. It would have tools to create graphics of different sorts. And it would have access to a wide range of open source data, with menus to choose it, though with more precision about types than WL. Ideally you’d be able to import data into SQL-type tables (or maybe JSON or XML objects), and manipulate them with either the SQL language or a functional tool like C#’s LINQ.

The tool I’m imagining is the holy grail of this blog right now. If it existed, I’d be using it now. If it continues to not exist…I might have to try to make it.

How to get kids’ hands dirty with data? (Part 1)

In my last post, thScreen Shot 2016-09-05 at 4.09.33 PMe question was what kind of programming projects should students be working on if not video games? And an answer I provided, without much detail, is helping kids work with real data. This post is some thoughts about how to make that possible.

Real data is information that’s been collected in a scientific process that reflects things that are actually going on. Ideally I’m thinking of information that is already in some kind of tabular form (whether an Excel spreadsheet, a database table of some sort, or even a comma or tab separated value text file). But the data might instead be in a more programmatic form, such as JSON or XML files.

2000px-NOAA_logo.svgThe good news is there is tons of open-source data out there, and a lot of it is extremely useful and relevant. One of the best places to start is data.gov, a government portal to tons of government-generated data sets. This includes atmospheric data, to health data on US citizens, locations of American military bases, crime statistics, and countless others useful relevant things for kids to work with and address real-world problems.

Data.gov has a developers section with an API, but this is less useful than it sounds because the API just contains metadata and links to the data, and can’t actually directly access the data. Still it’s a good place to start.

Individual agencies like NASA, the FDA, NOAA and countless others each have their own independent APIs as well, that might allow a person to do more with the data.

nasaLogo-570x450In the short term, what a teacher would do is decide what kind of data students should work with, download it, and get it into a format the kids can use. Ideally this should be a database of some sort. Working with databases sounds big and scary. But the basic statements of SQL are not that hard, and every language has tons of packages to allow the language to work with an SQL database. This may be harder than I think, because I haven’t actually attempted to teach it yet. But I believe it can be done.

Why do we want to teach kids to work with databases? Because this is the skill most organizations, be they corporations, the government or nonprofits, are going to need their programmers to do. Of the students that end up doing some kind of coding, whether as full-time programmers or para-coders doing some code on the side of another job, very few will be making video games, but most will be working with some kind of data.

So what could the students do? Here are some of the kinds of projects they could attempt, with my estimation of their level of difficulty:

  • A search program for database keywords (easier)
  • Joining different tables to make another table (easy-medium)
  • Programmatically create graphs such as bar graphs (medium)
  • “Dynamic” graphs that change over another dimension like time (medium-hard)
  • Heatmaps or colored maps of states, districts or countries (harder)
  • Interactive data-tools that respond to user input (hard)
  • Animations showing motion of objects like planes or ships (really hard)

Ideally you’d like to be doing something that Excel can’t do easily. This might mean generating graphs based on different queries a user could enter.

Obviously this is going to take a lot of guidance at first. The emphasis here should be on the students solving a real-life problem. As much as possible students should make decisions about what kind of data they’d like to work with and what kind of problems they’d like to solve.

I’m going to be working with a small group of students to experiment with exploring data this year and I’ll post about how it’s working.

But I’m thinking bigger. I’m thinking of making a tool or IDE that helps students to do more of this independently. But that deserves a post of its own, which will be part 2 of this topic.

Coding Instruction: Beyond Video Games

Scratch Cat at WorkI love teaching kids to make video games. I’ve done this for quite awhile now, using a lot of different languages, including Python, Scratch, the JavaScript/Processing port at Khan Academy, the Greenfoot Java IDE, full Java with Eclipse and the once-promising XNA platform in C#, killed in its youth by short-sighted Microsoft executives (though resurrected as open source code as Monogame). There is a lot to recommend teaching coding this way. Kids love to play video games, so it makes sense they’d love making them. And starting by creating a sprite and moving it across a screen is way more interesting than another “Hello World” program. But the more I do it, the more aware of the limitations.

One important limitation is that making a video game, much like making a website, is not entirely or even mostly a coding task. A big part of making a video game is actually graphic design. Much as they say there are really only a few basic plots in fiction, there are really only a limited number of different types of “actions” in a video game, especially a simple 2D one of the sort a student would hope to make. Once you get that working, it’s mostly about how your sprites and backgrounds look. Designing sprites and backgrounds is an interesting challenge in itself, but it’s a different skill from coding and a distraction from a coding class. Often I have to make hard decisions about how much time I want to let kids spend looking for or designing game sprites, because on the one hand the small number of sprites built into most gaming platforms is limited and quickly gets boring, but on the other hand I am teaching coding, not graphic design.

Furthermore, though these different platforms mostly make beginning a game very easy (with the exception of Java, which has a lot of hills to climb to even make a game window), often students quickly hit a ceiling when they start wanting to do things that the pre-designed built-in “move” and “turn” methods don’t let them do.

When they hit this limit, one of several things happens: some kids get bored and want to quit, some kidsCommands Scratch doesn't have doodle around and do the same thing over and over, and a smaller number of kids push against the limits of the platform. This can lead to some incredibly ugly code in a limited platforms – I’ve seen people build a scrolling platformer with Scratch, but it’s like painting your house with a nail polish brush, and the code is about as attractive. Other platforms have less of a ceiling; Greenfoot contains within it the full capabilities of Java, so theoretically you can do anything, but it’s a huge stretch to get from making a single-screen game with limited sprites and objectives to the sort of games kids imagine, with things like scrolling, jumping with gravity-like motion or different kinds of 3D.

So what’s the alternative? There’s always the “traditional” coding class, where students use Python or some other interpreted language to create text-based programs that say things like “Good job Jim you guessed the correct answer in 2 tries!” But as I argued in an earlier post, that’s exactly the kind of coding class that teaches everyone except natural programmers that coding is boring and not for people like them.

Arduino Uno boardPhysical computing, such as working with Arduinos, is definitely an area of potential here. I will be working with students making Arduino projects this year, and I’ll report on how it goes. As with games, physical computing involves working with things that are interesting learning tools in an of themselves but are not actually programming. More significantly, most Arduino programs, at least of the sort students do, are very simple, and may not touch on many advanced concepts. The difficult part of physical programming is usually the physical part. And anyways, you aren’t always in an electronics lab; sometimes you just have the computers to program with. So what else can you do?

no_northwindI’ve been thinking about this a lot, and it came to me the other day: students really need to do is work with data. Real data, not the Northwind database or fake lists of names from the phone book. Data about the real world, like climate records, health surveys, demographic information, data that allows them to address real problems in the world. But where do they get the data? And how do we get it in a format that they can work with it?

I have thoughts about this, but it’s too much to add to this post. I’ll be addressing my ideas about how to get real data in the hands of students in future posts.

Reading, Writing, ‘Rithmatic and Return 0

A high school teacher who made the school website. A broker who makes complex linked Excel sheets that update with visual basic macros. A graphic designer who programs simple interactive animations using HTML5. An assistant at a marketing firm who writes sophisticated database queries that find exactly the demographic slice the marketers are looking for. A lawyer who writes regular expressions that search and alter terms in large contracts. A scientist who programs an Arduino to run a piece of lab equipment. 

Are these people programmers?

xkcd Coding InsultsFrom the point of view of a “real” programmer, the answer is clearly no. At best they might generously allow that such a person is a “scripter” or some other diminutive. If a trained programmer with a computer science degree saw their code, the programmer’s reaction would probably be like this xkcd cartoon. They might say it’s okay for amateur code, but not programming.

And yet calling these part time coders “amateurs” is literally wrong by definition. These people are actually getting paid for their code. And what they make is often as essential to their organizations as the work that “real” programmers are doing. What they lack in depth of computer science knowledge they make up for in knowing exactly what’s needed and in being there able to do it for whatever they were already getting paid.

What do these para-coders have in common? They were mostly hired to do something else, but used their coding skills to make themselves more valuable to the organization they work for. They are adding to their resume and making themselves more desirable as employees. Also they are doing something extremely valuable that “real” programmers wouldn’t take the time to do. 

Right now such employees are still exceptions. But this kind of situation is becoming more and more common. The things people are required to do on computers is becoming more complex. More websites, cloud data sources and even commercial products have APIs to interact with and control them. People who can manipulate these things are going to become more and more in demand. And companies are going to want people educated in these skills.

That’s why I predict that within 20 or 30 years, programming will move from being an elective to being a core subject that everyone will have to learn at least a little. I’m not saying everyone will need to code in their daily lives; most still won’t. Most people today can go through their lives without ever needing to factor a polynomial, but we still expect every high school student to learn it, at least long enough to pass the state test.

And this example from math is illustrative in another way. Factoring polynomials is essential if you’re going on to learn calculus, which most people don’t. Otherwise, it’s not something that comes up much. Many people argue that we’d be better off teaching students less algebra and more statistics, for example. I won’t weigh in on that, but since the curriculum for coding isn’t set in stone the way it is for math, now is a good time to think of what students should be learning in a coding class.

Certainly the sorts of things you see in most programming languages: things like if and else statements, for and while loops, variables and methods or functions. If everyone had a basic understanding of these concepts, that would be useful enough. But say we wanted to go a little further, what else would we teach them?

Some people would want everyone to have a basic understanding of HTML. Others might want students to learn more about how computer memory works. Others might want students to learn to work with different kinds of APIs.

One thing I’m confident of is we have to start figuring out how to prepare students for this future. That means we need to start figuring out what a programming class should be for students who aren’t natural coders. What that looks like is going to be a central topic for this blog, and I’ll be sharing some ideas I have soon.

Goodbye, “Hello World”

Hello WorldIf you have taken any programming class in your life, you’ve probably written a “Hello World” program. When I first learned BASIC, it looked like:

10 PRINT “HELLO WORLD!”

In a modern programming languages like Python lessons often start out pretty much the same. At the code.org “Eliza” lesson, the first program is a “Hello World” program. Most intro programming books still start out with a HW program.

The point of a “Hello World” program is to show the student that they can give the computer a command and make it do something. Then you might write a simple program that asks for your name and says “Hello ” + name + “, how are you today?” Then you might write a “for” loop that counts to 10.

And it could be exciting at first. Sure, you could just tell the computer to say your name now, but pretty soon you’d be writing Ultima IV! It’s the same kind of optimism that makes a kid slogging through Heian Shodan in Karate for the first time imagine that if they just keep it up they’ll be like Jackie Chan.
Apple 2+ GameBut then again, when we were kids it was a lot easier to believe. When I was a kid we played computer games on the Apple II++, almost all of them illegally pirated off pre-Internet bulletin boards we dialed into with modems like Matthew Broderick used in Wargames. Though we couldn’t write games like that ourselves, they used simple blocky graphics not too different from the blocks and circles we could draw after a few weeks of learning BASIC.

Some of the games were even written in BASIC, with code that we could see and modify. Most of them were compiled into machine language, of course, but even those could be tinkered with by getting a hex editor program and screwing around with the code. We were mostly trying to break the copy protection, which was usually found in the place were there were lots of FF’s. Of course back then we didn’t know that FF represented
the maximum value of a byte, or what language these compiled programs were originally written in. But there seemed an accessible bridge between the simple code we were writing and the code we were using day to day.
SkyrimBut imagine a student today trying to see the the bridge that goes from “Hello World” to games like Halo or Skyrim. When I was young, many video games were written by a single person, or at most a small team of three or four people. Kids today may not realize that a few people with a laptop couldn’t program Call of Duty any more than few people with a video camera could make Guardians of the Galaxy; both call for multi-million dollar budgets, enormous teams and expensive equipment.

But that’s not the biggest difference. Because today, even programmers working alone rarely write a whole program from scratch, any more than a mechanic builds a car from scratch. Nearly every programmer out there is hacking away at enormous existing legacy programs, most of which they don’t understand, usually by cannibalizing interfaces and classes from other working programs that themselves were codged together out of older, working code. There is just too much useful code out there for programmers to waste their time rewriting Wheel.class when there are countless different versions of it that can be downloaded and plugged into your program.

But starting from “Hello World,” students don’t realize this. They look at these sprawling pieces of code, like the programs that make Netflix or YouTube work, and despair of ever writing anything like that. No wonder they want to quit. A Little League baseball player can look at what Derek Jeter does and see that he is doing pretty much the same thing they are, just way, way better. A beginning programming student looking at modern code is more like a kid making a lego car looking at a Formula 1 machine.

Sure, you can try to tell them that the programs they use everyday were made by thousands of people standing on the shoulders of thousands more, all putting together little blocks of code made of other people’s code and putting it on top of even more existing code. But if all you’re doing is teaching them syntax, such as the command for taking input or drawing circles or counting from 0 to 10, they are missing a big part of what programmers do today.

Most programming languages that are used commercially today, such as C++, Java and C#, are “object oriented.” This means that programs are written as modular chunks, known as classes or interfaces, that do things and store data and can be reused in many different contexts without having to change the code inside. If there is a class that already does what you need, you figure out how it works and plug it in. And there are thousands of useful classes in most modern languages that do everything from transferring data over a network to drawing sophisticated 3d images to encrypting your information. This kind of modular programming is so useful that even languages like Perl or Python that weren’t originally object oriented now have OO features.

Eclipse Screenshot
On top of that, programmers today use sophisticated development environments like Eclipse or Visual Studio that automatically highlight compile errors as you write your code, organize your different classes and files, autocomplete your commands, and take care of a lot of the boring details like importing packages.

None of this is to say that programming is easy. Professional programmers will routinely spend 12 or more hours slaving over an algorithm trying to get it to work they way they want. Any longtime coder can tell you stories about looking up at 11pm and realizing they haven’t eaten since breakfast. But it’s hard in a different way than beginning programming students think.

So how can we teach students to program in a way that prepares them for today’s world? I am still figuring this out myself, but here is my manifesto:

1) Instead of starting from scratch with simple programs that do almost nothing, have students start working on existing programs that do interesting things, and show them how to make simple changes to them.

2) Instead of starting with simple commands like “print()” and “drawrect()” show them how to put together larger classes that do complex things into a more sophisticated program.

3) Object oriented programming should no longer be seen as an advanced topic to be gotten to after learning all of the simpler syntax, but rather the basic element of all programming. Creating objects, using their methods and passing them arguments should happen almost immediately.

4) Focus less on the syntax of a particular language and more on the big ideas of how programs and computers work. The first languages I learned were BASIC and PASCAL, neither of which anyone I know uses anymore. (I don’t even remember a single command for PASCAL). Since then at various times I’ve written programs or scripts in many languages, including PERL, Python, Flash Action Script, JavaScript, PHP, Visual Basic and Java. I can learn these languages because most of the same kinds of structures occur in every language, such as conditionals, collections and loops. When you know how to write a for/next loop in Python, writing one in C++ is as simple as Googling the syntax. Of course each language has its own unique features that other languages don’t have, but when you have a foundation it’s not so hard to learn. When you know how a computer accesses memory it’s a lot easier to understand how a pointer in C works.

5) Most of all, don’t teach beginning students to write programs that do abstract, boring things. There was a time when getting a computer to do anything was an accomplishment, but those days are a long time ago. Sorting an array is a valuable skill for a Computer Science student, but pretty boring for a young kid learning to program for the first time. Teach them to do things like make video games, put their messages into secret code that only their friends can read, or make interactive Web pages. Have them put together existing methods and methods you’ve created to make something exciting. Sure, they still won’t be as good as professional-quality programs, but they will see the connection believe that they can get there if they stick with it.

I think we’re just beginning to understand how to really teach programming. It’s not like teaching math, where you need all the basics of arithmetic before you can do anything interesting, and it’s not like teaching language, where you’re building off skills that are instinctive. It’s a completely different kind of thinking, and its application goes way beyond computers. I fantasize about a day when school is for reading, writing, ‘rithmatic, and…some synonym for coding that starts with ‘r.’

But it’s never going to happen until we learn to teach it right.