Cosmic Queries – Algorithms and Data, with Hannah Fry - StarTalk Radio : Cosmic Queries

About This Episode

On this episode of StarTalk Radio, Neil deGrasse Tyson and comic co-host Chuck Nice are investigating the scientific – and somewhat mysterious – world of algorithms and big data. Joining them is Hannah Fry, PhD, mathematician and author of Hello World: How to be Human in the Age of the Machine, to lend her expertise as we answer fan-submitted Cosmic Queries from across the internet.

What is an algorithm? What does it mean to use an algorithm? Hannah gives us a lesson in algorithms 101. You’ll learn why an algorithm is not just a function of a computer. Discover why your favorite recipe is an algorithm. Explore the differences between an algorithm and artificial intelligence. Ponder whether applying algorithms to human behavior might give strong evolutionary guidance. We also explore if you can make a predictive algorithm for lightning strikes or ocean currents.

Find out the state of big data in the modern world. You’ll hear why the biggest challenge about modern data comes down to quality, not volume. We discuss the processes of solving problems. Is grunting through to a solution the best way to solve a problem or does every problem have a quick answer? You’ll learn more about the “P vs NP Problem” and the four-color theorem.

What’s the future of algorithms and data? Is biological data the next frontier? We examine the idea of predicting crime, like in Minority Report, and Hannah tells us about an example of this in the real world that completely failed. All that, plus, we also discuss why sometimes the best way to study data is by using methods that aren’t new.

Thanks to our Patrons Dan McGowan, Sullivan S Paulson, Zerman Whitley, Solomon Nadaf, Eric Justin Morales, Matthew Iskander, and Cody Stanley for supporting us this week.

NOTE: StarTalk+ Patrons and All-Access subscribers can watch or listen to this entire episode commercial-free.

About the prints that flank Neil in this video:

“Black Swan” & “White Swan” limited edition serigraph prints by Coast Salish artist Jane Kwatleematt Marston. For more information about this artist and her work, visit Inuit Gallery of Vancouver, https://inuit.com/.

Transcript

DOWNLOAD SRT

Welcome to StarTalk, your place in the universe where science and pop culture collide. StarTalk begins right now. This is StarTalk. I’m your host, Neil deGrasse Tyson, your personal astrophysicist, and this is a Cosmic Queries edition. That’s what, becoming...

Welcome to StarTalk, your place in the universe where science and pop culture collide.

StarTalk begins right now.

This is StarTalk.

I’m your host, Neil deGrasse Tyson, your personal astrophysicist, and this is a Cosmic Queries edition.

That’s what, becoming a fan favorite.

And when I do Cosmic Queries, you know Chuck Nice can’t be far behind.

Chuck.

Hey, what’s happening, Neil?

Dude, welcome back in the house.

Yeah, yeah.

Well, yeah, virtually in the house.

Virtually, in the Coronaverse house.

That’s right.

We’re all in the same Coronaverse house.

Today, we’re doing Cosmic Queries on algorithms and data.

Algorithms and data?

You gotta love me some algorithms and data because nothing happened.

I do not.

I’m not a big fan.

Not a fan of either.

I mean, I’m a fan of data.

Both the actual information kind and the Android from Star Trek.

I’m a fan of data.

Algorithms?

Oh, yeah, I forgot.

We had a whole entity named data, an Android basically.

So what we have here is, we’ve invited into studio today, data, math, we have an expert on algorithms and data.

A mathematician, Hannah Fry, who’s dialing in from the UK.

Hannah, welcome.

Thank you very much.

You know Chuck, you’re not the only person who hates the word algorithm.

I was once at a tech conference and I was just chatting to this guy, I think it’s a word that makes about 85% of people wanna gouge out their own eyes.

He agreed with me, he said, yeah, but it does make the remaining 15% of people mildly aroused.

So I know what I’m in.

Well, you know, yeah, mildly aroused, that’s because Al is only mildly sexy.

Mr.

Go Rhythm.

Oh, Al Go Rhythm, yeah, there you go.

Exactly.

So let me get your full bio here.

You’re associate professor of mathematics, University College London, and you co-host a BBC radio show, Radio 4, because BBC has very stove-piped channels, and it’s the curious cases of Rutherford and Fry.

Yeah.

Now you have to do some splayin on that one.

Author of the book recently released, just last year, Hello World, Being Human in the Age of Algorithms.

So you are the person for this Cosmic Queries.

I mean, I know a thing or two.

Hannah, if I may, can I just make you feel a little more at home digitally as we cross the great pond?

Here we go.

BBC News time.

Five o’clock GMT.

Did you like that?

He’s good.

He can get a job, I’m sure.

The Brits show off that they have the Prime Meridian.

Yes, exactly.

The Universal Time Greenwich means.

In fact, I live in Greenwich.

The Prime Meridian is about 100 meters from my house.

Do you feel it though?

Do you feel it?

What’s the point?

You go for a little walk around.

There’s like a peninsula.

And along the Prime Meridian, as it goes across the water, there’s just an arrow that says here, oh no, I’ve forgotten the circumference of the earth now.

16,000 kilometers, something like that?

No, no, way bigger than that.

People once thought it was 16,000 in ancient Greece.

So we’ve got to update you on it.

I made a guess and I made a guess with the wrong person.

She’s a fan of history, she’s a fan of history, that’s all.

There’s an arrow that says back to here, which I quite like.

We’re about 50,000 kilometers, 40 to 50,000 kilometers.

Wow, I was within a factor of 10.

I know it to miles and you’re the guys who gave us miles, so it’s 25,000 miles is what that is.

But I’m just delighted you live near the Prime Iridian and you can get a little vibe from it when we take a stroll.

So tell me about data.

What’s the state of data today relative to like when any of us were kids or even before there were computers?

The word data, of course, predates computers.

So what’s going on today?

Well, in some ways, I mean, in a lot of ways, not that much has changed.

I mean, it’s still a case of people collecting statistics about humans, about how we behave, about what we do, using kind of mathematical techniques to analyze it and trying to infer scientifically everything that you can about our behavior from it.

So this is like a subject that has a long history going back to the 50s and the 60s.

I think that what’s really changed is just the volume of data.

I mean, you don’t need me to tell you just the incredible amounts of data that is collected on us.

But I think for me, it’s not just the amount of data that’s directly collated about us.

It’s the things that you can infer from that data.

The sort of, the guesses that you can make about people that you wouldn’t necessarily realize you could do.

So a really nice example of what I’m talking about here is I was talking to a chief data officer for supermarkets, a chain of supermarkets in the UK.

And this supermarket, they have access to everything that everyone buys.

So they know what’s in your basket.

They know your weekly shop.

But they also sell home insurance, right?

So they can tell who makes the higher claims on their home insurance and compare it to what people are buying.

And they realized in their data that you can tell that people who claim less on their home insurance are people who tend to cook food at home, which is kind of like, oh, okay.

I mean, once you hear it, it’s like, well, that sort of makes sense, I guess, right?

If you’re a very house proud person and you’re spending ages creating a meal from scratch, then you’re not gonna let your kids play football in the house, right?

It’s like kind of the groups connect.

But the question is, how do you decide who the home cook?

Anyway, it turns out that there is one item that’s like the strongest indicator of all.

There’s this one item that’s in your basket that’s the biggest giveaway that you are a home cook more than any other.

And it is-

I’m going with frozen pizza.

I mean, it’s kind of us cooking, I guess.

Wanna guess, do you wanna guess, Neil?

Let me see, I like to cook at home and let’s see, I cook a certain type of food, but I can’t cook without olive oil.

How about anchovy paste?

Ooh, yeah, I think it is, isn’t it something?

And the shrimp is a bit more…

Or tomato paste, something that’s so base level kitchen preparation.

So it’s actually a little bit more, a little bit more, I guess, of, it’s fresh fennel.

It’s fresh fennel is the answer.

And I mean, I think that’s right.

Like you just wouldn’t, if you’re buying fresh fennel, you must be a home cook.

No, you’re only cooking at home.

That is the only purpose for fresh fennel.

I mean, seriously, if nobody’s saying like, oh, I have to pick up some fresh fennel, I need to shine my shoes.

So I started buying fresh fennel to see if my home insurance prices would start going down, but as yet, nothing.

Whoa.

So that’s one thing where the access people have to who and what you are when it’s becomes when the data is consumer-based data.

So I guess we get that.

And a lot of people are fear that or angered by it.

But before we land there more securely, let me just comment that as the volume of data has grown, because computers are obtaining data constantly, hasn’t the computing ability to analyze data risen with it so that we’re not really feeling the stress of being smothered in data that we might have feared a few decades ago?

Yeah.

But as analysts, you may be not feeling the fear of being smothered.

Okay, so I first heard from John Allen Paulus.

Now, this is now mid 90s, late 90s.

What he said was, the internet is the world’s biggest library.

The problem is all the books are scattered on the floor.

And I said, wow, there’s a brilliant analogy.

But at the time, there wasn’t a Google search engine or any kind of way to organize that information.

Within a few years, there were search engines.

So the books were no longer on the floor.

Who knows where the hell they are now, but they’re not on the floor.

It was clever, but it doesn’t apply today because we can get through the data.

And my people, we have very big telescopes getting a lot of data on the universe.

Half of the effort in prepping for that telescope is the data pipeline to handle the data.

So are you, can you say that you’re awash in data?

Are you on top of the situation?

Well, okay, are we on top of the situation?

Okay, so definitely like the amount of grunt that you have now, just that you just didn’t have access to before.

You can handle vast datasets with just incredibly intricate detail about, you know, everything from the cosmos to human behavior that you just weren’t able to before.

But the thing is, is that I think the biggest challenge when it comes to data isn’t so much about volume, but more about quality.

And the thing is, is that anyone who has worked with data knows that cleaning data is like so much the battle.

You get like this massive great big wallop of data, you’re like, brilliant, it’s going to be so good.

You can’t wait to like dive in.

And then you realize just how long it takes to make it, to get it into a shape where it does anything that you want it to do.

Let me think of an example of something that I can give you that will be…

I can tell you, in my field, we just call it reducing the data.

If I showed you raw data from the universe, you’d say, what the hell is that?

Where’s my pretty Hubble picture?

Well, if you knew what the hell happened between the photon hitting the telescope and the photo that ended up in the press, you’d be, you might be shocked, but no, you wouldn’t be Hannah, but others might.

So you have an example.

I’m trying to think, sorry, forgive me.

There must be one that’s really funny and quite stupid.

Well, that might come out in the Q&A part.

That’s the one we want, definitely.

Yeah, I know you definitely want one where something’s just really stupid.

There’s definitely got to be one.

Another quick thing about your bio.

Tell me about your BBC4 radio show.

Oh, okay.

So this is a show that I host with the geneticist Adam Rutherford.

And we’ve been going for about, I think we’re recording our 16th series now.

So the idea is that people send us in questions and we go out and investigate them.

And initially we, because Radio 4 is like, it’s kind of the posh channel.

It’s very highbrow.

It’s very, there’s no music.

It’s all, it’s where the politicians go and it’s where they have these deep intellectual debates.

I mean, they have like, you know, programs on philosophy on it, right?

It’s like the very kind of like, the very highbrow stuff.

So initially we wanted our program to be very highbrow too and to be very serious, like we’re very serious scientists.

But we discovered quite quickly that actually what works better is if you just basically muck around.

And as a result of that, the questions that have been coming in have been from families with younger kids.

And they end up being like the best questions.

So we had a question that I really liked, which was, what’s the tiniest dinosaur?

Which is a question that was asked of us by an eight-year-old.

Seems like a really trivial, silly question that you can just dismiss with a quick Google search.

But actually unfolds this whole thing of how do you define size, how do you define dinosaur, this whole kind of world underneath it.

So yeah, that’s really what it’s morphed into, is just this wonderful playground where all the annoying questions that kids want to ask their parents, they send them into us instead.

So to the benefit, I think, of the listeners.

So it kind of undoes some of their poshitude.

It does.

It’s definitely not posh.

Let’s see if we can lead off with a question here.

Chuck, we got a first question on algorithms and data for one of the world’s experts on those subjects.

Sure thing.

Let’s start off.

We always start off with a Patreon patron because they give us money.

So, let’s go to TJ Monroe from Patreon says, Dr.

Tyson and Dr.

Fry, the two best radio voices in science.

Oh.

Wait, wait.

So, how is Rutherford’s voice in your show?

It’s rubbish.

Yours is much better.

So, we got to do our own show then.

He says, can you walk through the process of creating a predictive algorithm for something like the path of a lightning bolt or ocean currents?

Now, one is a lot easier than the other.

Yeah, sure.

Because one repeats itself.

I’m sorry.

Go ahead.

You say that, Chuck, because you’re an expert on this.

Well, yes.

So, Neil, in my spare time.

But what I like, that’s a great question.

What I like about it is, there are two things that are highly sort of, you know, oceans can be turbulent.

You have storms and things, but there is a prevailing thing.

You don’t know where lightning is going to strike, but you know it’s going to strike somewhere over there.

So, I love that question.

So, Hannah, has that reduced itself to algorithms at this point or not?

So, I haven’t seen an algorithm for predicting lightning strikes, but I’m just thinking through how you could do it.

So, certainly, there are going to be certain things that go into it, as you say, right?

Like, there are certain days when you can look out in the sky and be pretty confident that no lightning strikes are going to happen and other days where you can be fairly confident that they will.

So, there are certain things that you can measure in the atmosphere, the atmospheric pressure, the humidity, all of those kind of things that you could plug into a system that could help you predict the likelihood of a lightning strike.

But the exact path of it, I mean, Neil, you probably know more about this than me, but I would say that the exact path of it is going to be very difficult to deduce, precisely where it will end up.

It’s easy, yeah.

Would you call it an algorithm if you are checking the atmospheric pressure and the humidity and the size of the clouds and the moisture?

If those are just inputs to something that calculates, does that come under your category of algorithm as well?

Yeah, I think so.

I mean, obviously there are different kinds of algorithm, and the artificial intelligence is one that gets a lot of attention, and the algorithms that deal with sort of data on the internet is another.

But I think anything that is taking something from the real world, like a recipe, right?

Like taking ingredients from the real world, doing something with it, and then spitting out some kind of answer, I think for me that counts as an algorithm.

I mean, you know, technically if you stop and ask someone for directions, if you’re in your car and you stop and ask someone for directions, and they say go down there, go that way, that way, that way, I mean, technically they’re giving you an algorithm.

Right, right.

So, okay, so algorithm is a very wide catch basin then for accounting for things that you want to predict or understand.

So with lightning, you’ll only discharge a cloud if the buildup in charge is very different, either from one cloud to another cloud or between the cloud and the ground.

And so, like you said, Hannah, if you measure the humidity, you can check to see what is the propensity of electricity to cross humid air versus dry air and look at a threshold for that.

And you say, when it hits this threshold, it’s going.

And how much of algorithms is also a thresholding phenomenon?

Oh, lots and lots.

So I think, yeah, I think that…

So that’s the difference between…

Sometimes you are trying to predict exactly what’s going to happen.

You’re trying to predict, let’s say, if a ball is rolling down a hill or whatever, exactly where it’s going to end up, that kind of thing.

And sometimes you’re just saying, as you said, what is the probability that this might happen?

And at what point do you set this threshold to say, okay?

For instance, the example that I gave you earlier about the fennel thing, right?

It’s not definitely…

It’s not like you buy fennel, therefore you’re not going to claim on your home insurance.

It’s like you buy fennel, therefore you’re likely to be a home cook, therefore you’re likely to do this.

And then you sort of all the way through those, you’re going to be setting thresholds where you say, if it tips over this, then we assume you’re a home cook.

If it tips over this, we assume you’re…

Whatever.

Interesting.

Okay, so that’s an important fact here, because it’s not the one piece of data that gives you the…

It’s not the one piece of data that tells you what everything is.

It’s the one piece of data that might put you over the edge of that conclusion.

I think when you’re dealing with uncertainty, there are very few things, especially when you’re handling data to do with human behavior, very few things are cold hard facts.

Very few things are…

You’re rarely dealing in absolutes.

So when you’re handling uncertainty, the only way that you can possibly convert uncertainty into a yes-no answer is by saying, here’s the line, if we cross it, we’ll assume it’s a yes.

And the ocean currents, like you said, Chuck, those have prevailing…

They’re not catastrophic the way a lightning bolt is, so presumably that doesn’t have this kind of thresholding in the prediction.

I mean, ocean currents, they’re very sophisticated equations that can describe fluid flow, right?

So they’re still not absolute.

Especially when you’re dealing with turbulence, there’s still a lot of probability and randomness and chaos, right, that’s involved in all of that.

But you can say with more…

It’s not a thresholding problem, as you say, right?

You can say with more certainty where things are going to be and how they’re going to be moving.

I was going to say there’s also connection.

When you’re dealing with ocean currents as opposed to lightning bolts, ocean currents are all connected because it’s not one ocean.

It’s an entire oceanic system that happens on the globe whereas lightning bolts are isolated incidents.

So there’s that too.

That’s very true.

Although if you ask the mathematician, the mathematicians who study the ocean, I mean, they assume it’s two-dimensional.

So there’s the un-existence of having a bit lost on them.

Yeah, Chuck, you start subtracting dimensions to make the problem easier to solve.

Whether or not your answer is correct at the end.

Elegant there, right?

Elegant.

We’ll get back to that.

Let’s take our first break, and when we come back more Cosmic Queries with data mathematician Hannah Fry when we return.

We’re back, StarTalk, Cosmic Queries, Algorithms and Data.

Yeah, I said it.

Algorithms and Data.

Chuck Nice, co-host.

Tweeting at Chuck Nice Comic.

Thank you, sir.

And we have, as our special guest, Professor of Mathematics at University College London, Hannah Fry.

I’m an associate professor.

You just gave me a promotion there, thank you.

I will do that.

And you take that to the bank and to your department chair.

Hannah, how would you like to be a chancellor?

I wanna be sir, that’s what I want.

Oh, there you go.

But so Chuck, you got another question about it, didn’t you?

Sure, sure, sure.

Let’s go back to Patreon and let’s go to Shawshank Submaranian, who says, Neil, hello, and Hannah, hello.

I would like to know how impactful would solving the P versus non-P problem with respect to our capabilities of understanding the universe?

Excuse me, the P versus non-P problem, are we all fluent?

Yes, exactly.

If you have consumed copious amounts of liquid, the P versus non-P problem becomes quite the conundrum.

What is the proximity to your water closet?

So Hannah, what is P versus non-P?

And is that a real outstanding problem?

Yeah, yeah, totally.

So this is one of the millennium math problems.

So if you solve this, Shoshank, what was his name?

Shoshank?

Shoshank Submaranian.

Okay, Shoshank.

He’s showing off by the way with this question.

He’s showing off.

If you solve this problem, you win a million dollars.

So, I mean, it’s kind of maybe the change to the universe will be bigger, but definitely a big change to your life.

Okay, so let’s say that I gave you a gigantic Sudoku puzzle and asked you to solve it.

I mean, like a really massive one, right, not just like nine scripts, a massive, massive one and asked you to solve it.

It would take you forever to solve it.

But if I said, here’s a solved one, I want you to check if it’s right.

Actually, that’s a much easier problem, right?

Even though they could be the same Sudoku puzzle, filling it in in the first place is much, much, much harder than just checking that it’s right.

So there are some…

Wait, wait, wait, wait, wait, you say a solved puzzle.

Yeah.

That would imply it’s right.

So you mean a filled in puzzle.

Okay, you’re right.

My language is sloppy, I take it back.

A filled in puzzle, I take it back.

Okay, so sometimes where you’ve got like a blank Sudoku grid, effectively, or the analogy in computers, if it’s very easy to check that the answer is right, sometimes you can use that as a loophole to get you to the answer very quickly, right?

Because you can just generate answers and check if they’re right, rather than kind of go through and grunt through the entire process.

So the question is, is that always the case?

If you can check that an answer is right quickly, much quicker than you can to solve the problem in the first place, is that always the case?

Can you solve something, do really hard problems that you’ve got to grunt through, always have quick solutions or not?

And the reason why this has repercussions and the reason why this has potential impact on our understanding of the universe, is that an awful lot of the algorithms that we use to try and understand gigantic systems, I’m sure that this is certainly true in a lot of cosmology, a lot of them have to use very clever workarounds to account for the fact that some problems are just really hard, some problems you kind of have to grunt through to find the answer.

So there are a whole host of different algorithms that exist to try and make that grunting process easier.

But if it were the case that actually all these difficult problems do have an easy, quick solution, I mean, that would be, if you could suddenly reduce the amount of computational time that you spend on a problem, I mean, that would have a dramatic, dramatic effect on the number of things that you could compute.

So I wanna, Chuck, I wanna show off in front of Hannah, so give me a moment here.

How good do you think?

So did the four color map problem, I was around and in college when that was solved and it was considered inelegant because someone put an algorithm and just grunted through it, but they solved it and no one else had solved it before.

So in principle then that implies that it was easier to solve it that way than by any analytic way.

Is that a fair analogy here or not?

People got very upset about that, didn’t they?

Yes, I remember that.

Yeah, because normally when you do a proof, you write it down and it’s these elegant statements of logic.

It fits on the back of an envelope.

Yeah, right, exactly.

So at the risk of not being a part of the four-color map pre…

Good point.

The four-color map love fest, what the hell is the four-color map problem?

Okay, you know when you get like a map of the states, you know, the elements of states, and you’ve got like all of the different states, whatever, and you want to color it in, the question is, can you color it in with four colors so that no two states next to each other share the same color, right?

It turns out that you can.

The question is, is there…

So the four-color problem is…

Let me get this right, actually.

I know you might remember more than me, but does any map exist for which you cannot color it in in four colors?

Yeah, that’s the way…

What is the minimum number of colors that you need to color any map?

I think we knew that the four colors was the right answer.

We just didn’t have a proof of it.

And so it was intractable until somebody…

Again, I went to college so long ago, computers were new to the world of math.

Punch cards.

Oh, they’re selling.

I mean, it was the 70s, right?

It was the 70s.

Yes, it was back when I was in college.

And like a steam-powered handle.

Thank you.

So, and it was proven, but only through this, by checking the answer, not by proving the answer.

That feels like what you just described.

And so, I’ll give you an example from my field.

I think, Hannah, I think this is an example.

We have people looking for galaxies that are very, very low in their surface brightness.

Like you would scan by it, and you wouldn’t even know it was there.

Well, how do you look for them if they don’t reveal themselves?

Well, the ones we have found, we know what their light profile looks like.

And so, what we can do is set up a filter that goes out and tries to match the light in the sky to that filter.

And when you get a slight increase in a match, there’s a galaxy.

Now you can put all your resources there and say, yep, there’s a galaxy there.

So, you’re looking for the answer to ask the question.

Yes, yes, yes, yes.

Yeah.

So Hannah, is this legit?

I mean, are we allowed to do this?

Yeah, I mean, these are great examples of grunting through the solution.

And the question is, is that always the easiest way?

Or is there actually a trick?

Is there some clever trick that you could have used?

You know, I don’t know, like folding your data in half and looking for this superimposed light.

Maybe there’s some clever trick somewhere that there’s no one spotted yet.

You know, there’s so often cases where people come up with these clever tricks.

And maybe there was a clever trick that the whole thing could have been solved much quicker without having to grunt it through.

Okay, so if this problem gets solved, then it’ll give us confidence for all future problems to say, don’t even worry about figuring out the answer analytically, let’s just compute the answer and then check it.

I mean, if that’s easier, then for me, that’s less romantic, that’s less elegant.

It is, but then at the same time, you have to think of the potential repercussions of this.

Like, you know, there are some problems.

Like if you take protein folding as an example, right?

So proteins are just the source of so much, you know, life, I mean, essentially, they’re like the fundamental building blocks of life.

Building blocks.

But everything that could happen to the human body from, you know, Alzheimer’s to the drugs that you, like the effect of drugs that you take, I mean, everything, it all comes down to proteins.

And proteins, they’re like these long ribbons of amino acids.

And the way that they fold up determines their function.

But these things are incredibly, incredibly, incredibly complicated.

And it’s okay to go from the folded up, the folded up bundle of ribbon to the long string is possible, but going from the long string to work out what folded up knotted shape it makes is really, really, really, really super, super, super hard.

We understand all of the physics of it.

We have equations that could work it out, but you just cannot grunt through it all.

And if you, let’s say that you could solve this problem.

Let’s say that you could have a computer that could grunt through all the possibilities.

What that would mean is you could say, I want a protein that serves this function.

I want a protein that can combat this disease.

I want a protein that acts on the body in this way.

What shape is it?

You know, it’s like this shape.

Okay, now what is the string of amino acids I need to print to create that protein?

I mean, I’m talking like long, long, long, long, long term here.

These are not like things that are around the corner, but I mean, however, I think there are some applications of this stuff that means that actually romance is dead.

Like who cares about romance when you’ve got protein folding?

So Chuck is listening attentively here because he wants to know the formula for a funny joke.

I really don’t.

Why would I change anything now?

How do you fold your words together to guarantee there’s a funny joke at the other side?

I’m sure there’s an algorithm for that, you know, but and believe me, I would love that.

It’s funny because it sounds like what you’re talking about is quantum computing.

In part, the extreme level of it, yeah, if you could get the computing power and then all problems will just be solved depending on whoever spends time looking at it.

That would, I have to reiterate Hannah, that would take away the romance of the quest for me a little bit, I think.

A little bit.

Well, you have to sigh.

Did you hear that sigh?

Wait, so do you think that, do you think the full color problem was unromantic?

Yes, yes, yeah, yeah.

I mean, because, you know, we have D equals MC squared that fits, you know, children write down that equation as one of the most profound equations in the universe.

How many forces are there in the universe?

There’s not thousands, there’s four, all right?

In the early universe, there was fewer.

You want, I have a bias, a philosophical bias, that when I part the curtains, I want to find simplicity rather than complexity.

Do you think that simplicity always exists though?

Or do you think actually sometimes if you’re a physicist, you can get like drawn into this sort of, I don’t know, it’s like a potion of simplicity that maybe only exists then.

So here’s my lesson that I have to tell myself to get out of this sort of state of romance.

Johannes Kepler, when he first showed the planets going around the sun, and he was trying to figure out what kind of orbits could they have and about their distances.

And he had a system where he, he’s a mathematician, and you know, they’re the five platonic solids, right?

Do you know about this, Chuck?

No, the five solids?

No.

It’s a singing group from the sixties.

That’s what I was looking for.

That’s exactly what I was looking for.

So Hannah, you want to tell them the five solids?

Okay, so Plato was like super into this idea of everything being perfect.

So the platonic solids are the five shapes that can be created where every side is the same.

So a cube is a platonic solid, every side is a square, tetrahedron, octahedron, dodecahedron.

And what’s the last one?

Icosahedron, I guess?

Well, yeah, icosahedron.

And I think you left out the pyramid, which is what, the four-sided pyramid, tetrahedron.

No, she said it.

Oh, you said it.

Okay.

She said tetrahedron.

Did we get five?

So tetrahedron, octahedron, cube, icosahedron, and dodecahedron, right.

So each of those have the same shape polygon on all sides, but there are only five of them.

So Kepler knew this, and he also knew that there were six planets.

And he said, well, everything is perfect and divine, and math is perfect.

Maybe the planets are the separations of occupied orbits and the separations between nested platonic solids.

So he took them and nested them and put planet orbits, and he actually got pretty close.

He was like, but this was his ideal.

This was his sense of perfection that he was imposing on nature and it was all bullshit.

I think that happens quite often though.

I think it does happen where people fall in love with the simplicity of their theory and forget that actually often the world is really ugly.

Yes.

So I use the Kepler example.

We had, to his credit, it took him 15 years, but to his credit, 10 years, but he discarded the entire system and out came elliptical orbits.

Which are beautiful.

In their own way.

Not as beautiful as perfect circles as Copernicus had presumed.

So anyway, let’s go to the next question, Chuck.

All right.

Let’s go to John Baker from Patreon.

He says, Hi guys.

I’m back to prove my ignorance yet again.

What kind of empirical data is used?

Well, first, let me ask a core question.

What does it mean to use an algorithm?

Not that I couldn’t look it up on Wiki, or I could have just paid attention in school.

I love John Baker.

This guy’s amazing.

Anyway, you know what?

I should have let off with this question, because the truth is what we never have touched upon in the show yet.

What is an algorithm?

Now, what is it?

Hannah, algorithm 101.

Give it to me.

Algorithm is this gigantic umbrella term that doesn’t really mean very much, which I think is the reason why people hate the word so much.

But essentially, all it is is a series of logical steps that take you from some input to some kind of output.

So a recipe, a cake recipe, that counts.

That’s an algorithm.

Your inputs are your ingredients.

The logical steps is the recipe itself, the outputs, the cake that you get at the end.

The difference, though, is that when people…

Wait, wait, wait.

A cake recipe is a flowchart, I would think.

I don’t think cake is an algorithm.

It could be.

I mean, it’s just like a giant…

Like, algorithm, the word algorithm is like this giant, all-encompassing term.

But I think when people use it, they tend to mean something within a computer.

So something where you’re inputting some data and then the machine has some kind of autonomy in terms of the decisions that it makes along the way and spits out an answer at the end.

Of course, so computer programs, then everything you do in a computer program is an algorithm.

Is that fair to think of it that way?

Although, I mean, I think that when people use the word…

You know, if you go like…

Did you…

So I had a ZX Spectrum, which I think is quite a British thing.

It’s like a Commodore 64.

I hope that was the thing that I learned to code on.

Well, that’s the American one, right?

Yes, we did.

The Commodore 64 was the American version.

And that’s because Hannah is 80 years old.

I was going to say, I want to unpin my video now, so I can get a closer look at Hannah.

Because right now, she’s in a little window on my screen.

And she wants to be in her early 30s, late 20s.

And then she’s talking about Commodore 64.

And I’m like, is it her or is it my eyes?

Okay, grandma.

I talked to his grandma.

I learned coding when I was two.

She’s such a child genius.

Exactly.

So on my ZX spectrum, which is kind of the British equivalent, you would do like, print, hello, go to line 10.

And then it would just go round and round and round and fill the whole screen with the light.

That was the kind of programming that everyone did when they were kids.

And technically, that’s an algorithm.

It’s just a really rubbish one.

It’s like, you know, not particularly, not really doing anything.

So I think people use the word, they tend to mean that it’s like some kind of automated decision making.

That’s sort of really what they mean.

But I think, you know, if we’re being absolutely fair, the word algorithm encompasses all of this.

Now Chuck, the person had more to that question, but we just ran out of time in this segment.

So when we come back, we’ll pick up more on the angst shared by this questioner who wondered whether he should have learned all this in school in the first place.

This is StarTalk Cosmic Queries, we’ll be right back.

Hey, we’d like to give a Patreon shout out to the following Patreon patrons, Dan McGowan and Sullivan S.

Paulson.

Thank you so much for your support.

You know, without you, we just couldn’t do this show.

And if any of you out there listening would like your very own Patreon shout out, please go to patreon.com/startalkradio and support us.

Thank you We’re back, Cosmic Queries, Algorithms and Data Edition.

You never thought we’d go there, but we did.

So there, okay.

I got Chuck, of course, and Professor Hannah Fry, Associate Professor of Mathematics at University College, London.

You shared with us earlier in the session that you live in Greenwich, and we’ve all heard of Greenwich, even if you’ve never been there, Greenwich time, that’s like the time, the base time of the world, right?

You get kind of cocky about that?

You know, we swagger around here.

We swagger around.

It actually took me to move to Greenwich, so I’ve only lived here for three years or so, but it took me to move to Greenwich to realize that Greenwich mean time is, the word mean in it actually means average mean across an entire year.

I didn’t know that.

Oh yeah, entirely.

Of course 24 hours, a day is 24 hours.

No, it’s not, it’s 24 hours on average.

Yeah, it’s exactly right.

Yeah, the time it takes the sun to return to its spot on the sky, on average is 24 hours.

Sometimes it takes longer, sometimes it takes less.

People don’t know that.

Yeah, so yeah, I was happily drinking my Greenwich mean time lager and wandering around Greenwich mean time village, and I didn’t realize that mean meant average mean.

Yeah, it has nothing to do with the emotional state of your tongue.

Okay.

So Chuck, we left off someone upset that he didn’t learn the meaning of algorithm in school, but I think there was a question based down there.

What does it mean to use an algorithm?

So, and then Hannah said like, well, a recipe is an algorithm.

But I like what Hannah, I like the distinction Hannah is making as we go forward in the 21st century, that we think of algorithms as an automated procedure.

Yeah, you do, something that makes a decision.

And then I think that there’s a further distinction there as well between an algorithm and artificial intelligence.

And I think that like the way I like to think of this is, let’s say you’ve got a smart light bulb that’s kind of connected to the internet, and you decide to program it so that it turns on at six o’clock and goes off at 11 o’clock.

So that’s an algorithm, right?

You programmed it, you said, if it’s six o’clock, turn off, right?

If it or turn on, whatever.

That’s just a straightforward algorithm.

If it was artificial intelligence, generally speaking, most people agree that artificial intelligence needs to include some aspect of learning.

So instead, the light bulb would recognize that you came home at six o’clock and turned the light on.

It would recognize that you like to dim the switch, 9 p.m.

when you do some reading, and then you go to bed at 11 o’clock.

So if it’s starting to learn from its environment, and then impose those rules itself, that counts as artificial intelligence.

But that’s simply an updatable algorithm.

Yeah, yeah, yeah, exactly.

It’s something that’s continually revising itself.

By the way, you…

Go on, Chuck.

I was going to say, in addition to that, though, it is also more importantly pattern recognition.

So the update is based on the recognition of patterns.

Totally agree.

Yeah, completely.

Which was really hard until very recently.

Now, Hannah, you scared me a little when you began that comment because you said, imagine a smart light bulb.

And I thought, aren’t smart robots enough?

What would a smart light bulb be?

Light bulbs marching down the street.

It just pops above your head every time you have an idea.

I love the smart light bulbs.

It’s like, humans must die.

In order for us to shine, humans must die.

Exactly.

And in fact, they would dig up the joke from a comedian whose Twitter handle is TheScienceComedian.

And I quote him every now and then.

One of my favorite jokes of his or comments was, the light bulb was such a good idea, it became a symbol for a good idea.

So if light bulbs become our overlords, they will remind us that anytime we think something brilliant, it’s one of them that gets popped up in the air.

Don’t think we don’t know what you’re thinking.

Well, we only know what you’re thinking if it’s a good thing.

If it’s a good thought, we know it.

That’s funny.

So the science meat is Brian Mallow, if anybody want to dig him up.

Okay, so you got another question there, Chuck.

Short thing.

This is Ben Sellers, and Ben wants to know this.

From an evolution standpoint, our relationships and mating behaviors probably follow patterns useful for hunter gatherers.

How do our behaviors on social media and dating websites resemble patterns from more primitive days?

What would make interacting online more connected to our primitive programming?

Now, I don’t know if this is your purview, Hannah, but he’s making a really, you know, pretty, pretty poignant association, which is we now find people online.

That’s how we find love now.

I mean, and the number is only going up every year.

Do the hunter-gatherer, you know, brain sets actually apply to the way that we go after one another digitally?

I love this.

It’s like I’m just foraging, foraging for lovers.

Oh, yeah, hunter-gatherers.

By the way, it is kind of a foraging.

Swipe right, swipe right.

So you’re hunting.

So actually, one of the very first things I did, as soon as I finished my PhD, was I did this really silly talk that was like actually supposed to be this kind of private joke that just got really out of hand, which was called The Maths of Love, right, which was in part looking at data from online dating websites.

And it was kind of this thing where I just wanted to demonstrate that you can take a mathematical view to everything.

Anyway, it got terribly out of hand and ended up being a TED talk.

But in that, there was something that was really interesting that I think is relevant to this, which is…

Wait, wait, just a quick thing, Hannah.

Most people who have thoughts that get out of hand don’t end up giving TED talks.

So, it requires some level of brilliance to…

I just wanted to distinguish you from everybody else that would be encountered.

So, you’re a TED talk.

So, go on.

Well, for a number of years in Britain, people started calling me Dr.

Love.

And I was like, it was just a joke, guys.

It’s never been serious.

I’m really not talking to love.

Anyway.

Okay.

So, in it, one of the things that I talked about was about who gets most attention, right?

Whose photos get most attention on dating websites?

And you would think, okay, surely it’s going to be the people who everyone considers as best looking, right?

I mean, surely, right?

The most attractive people get the most attention, surely.

But, so, OkCupid, kind of an interesting dating website because for a while there, they were totally open about the fact that they were experimenting on their customers and, like, released all their data.

And also, on their website, what you’re allowed to do was rate how attractive you thought other people were on a scale between one and five, right?

So, five is very beautiful.

One, I think, slightly more facially challenged is the technical term.

And what they found was that it’s not true that just the people who get five get the most attention.

It was the people who divided opinion the most.

So, the people who were getting the most attention were like averaging out as kind of a four, but they weren’t people where everyone was giving them a four.

They were people where some people would give them a five and lots of people would give them a one.

Some people thought they were absolutely horrific and some people thought they were really beautiful.

And the explanation for this, which I quite like, is kind of like an instinctive one.

So, I guess this sort of goes into the hunt together thing in a way, which is that it’s like a game theory thing, right?

If you come across someone and you think that they’re very beautiful, then you imagine they’re getting lots of attention and you think, well, why would I…

There’s no point in throwing my hat in that ring.

I may as well stand back.

Whereas if you come across someone who you think is very beautiful, but you imagine other people will really dislike, so someone who’s a bit unusual in some way, then you’re like, great, this gorgeous person isn’t going to be getting that much attention and you kind of like throw yourself in.

But just because everyone’s doing that, that means it’s the really beautiful people who are not getting any attention and the kind of quirky ones who are getting lots.

So what you’re saying, apart from the sociocultural lessons from that, is algorithms that might apply to human behavior that data that we’re now collecting on billions of people provide might have some strong evolutionary guidance for us going forward.

Yeah, I mean, I think in terms of this, it’s always tough to link it back to evolution, isn’t it?

But I do think that you can certainly come up with these game theoretic arguments for the patterns in our behavior.

I mean, they’re not exactly falsifiable, but I think they’re fun to explore.

But it means you have an algorithm that’s established in one environment.

Like Chuck was saying, you’re on the plains of the Serengeti, and there’s a certain algorithm for our behavior that we can’t shake because it’s genetically encoded within us, perhaps.

Yeah, I mean, perhaps.

I mean, I guess I’m sure that there’s anthropologists who will know this much better than me.

But yeah, I don’t doubt it.

I’m sure that there are lots of occasions where we act really instinctively, where our ancient history causes us to act in a certain way.

And I’m sure that we do it still, even when we are interacting with people on a completely different platform to the one that we were designed for.

So Chuck, we’ve got time for maybe one or two more questions.

Wow.

All right, God.

We’re going to have to do another show, man.

I have 11 pages of questions.

Oh, okay, Hannah, you’ve got to come back.

That is how many people are really interested in this subject.

It’s unbelievable.

So I know we’re wrapping it up.

So I’m trying to find one.

Okay, here’s one.

This is Dean from Twitter.

And Dean says.

Wait, Chuck, you’re just choosing people whose names you can pronounce.

This whole episode have been pronounceable names.

Let me go with them instead.

I’ll go with Tielo Giammanas.

That ain’t his name.

I’m sure.

Oh, poor Dean now.

Dean thought he had his moment and robbed from him at the last minute.

Well, sorry, Dean.

We’re going to do the show again.

So we’ll get back to you, buddy.

All right.

So this is Tielo Giammanas.

Okay, from Twitter.

Okay.

He says.

I’m sorry.

I just he says, assume that the game on behavioral data is already over.

The next level is biological data, and the one after that is thought police.

And what are your expectations?

And what do you think the outcomes will be?

So what he’s really talking about there is predictive analytics.

Will we ever get predictive analytics to a point where you have pre-crime?

It’s like, you didn’t commit a crime, but you know what?

We know you’re going to commit this crime because these algorithms have actually profiled you in such a way that tells us that you are a criminal.

Hannah, you said that about Fennel, okay?

Oh my God, you did.

If you know that much about who a person is if they buy Fennel, I love what Chuck said there.

Can you get, are the algorithms so good that they know your thoughts and then they know your next behavior and then you pre-arrest someone?

Okay, so this is such a tough topic, right?

Like I spent a huge chunk of my book talking about this, you know, predictive policing and predictive algorithms.

Tell us the name of your book again just so we get that.

It’s called Hello World.

Hello World, Life in the Age of…

I can’t remember what they gave us.

I can’t remember what they did to the subtitle in America.

I think it’s How to be Human in the Age of the Algorithm, maybe.

I thought you were going to say, I can’t remember what the name of my book is.

This is the subtitle.

The subtitle is different in America.

They do that sometimes.

They have to translate between two English speaking countries.

They have to translate.

Okay, so this is a really tough topic, because the thing is that some people have definitely tried to do this.

There have definitely been some situations in which people have tried to predict.

There was one particular example in Chicago, I think it was, where the idea was quite straightforward, which was like, okay, well, when it comes to gun crime, often today’s victims are tomorrow’s perpetrators.

If you analyze the network of people who people are friends with, who people hang out with, that kind of stuff, if you analyze that network and feed in where events are happening, can you come up with a risk score if you like?

This is kind of like the threshold thing that you were talking about early with lightning.

Can you come up with a risk score that says, we think that this group of people or this group of people are likely to be involved in something in the near future?

And when this whole system was set up, it was set up in kind of a nice way.

I think it was set up with good intentions, because the idea was that if your name appeared on this list, then police and social workers would come around to your house, and they would…

So it would be like…

An intervention, right?

But it’s like, you know, here are these programs that you can join, here are these alternatives that you can…

We want to help you out of the life that you’re in, right?

That was kind of like the intention.

Of course, of course it didn’t work out that way, because if you give that list to people who have got a completely different set of priorities, as soon as there was, you know, a gun homicide, it turned out that people took this list and started at the top of the list, and then just started arresting all the way down.

So by the end, Rand Corporation did this analysis of the whole project, and by the end, the people who were on the list were, I can’t remember the numbers, but basically way more likely to have been arrested by the police, regardless of whether they were involved in the original crime.

So essentially it turned into a list, a harassment list, right?

And I think this is the thing, I think that in like a lab setting, in kind of a cold environment of like an ivory tower, actually, I think there are certain things that you can say about like likelihood.

You know, like there are some people that you can pick out who you know in a million years, they’re never going to commit a gun homicide.

And then there are other people who, you know, perhaps aren’t quite in the same boat.

And I think there are some things that you can say about humans.

But the problem is, is that the world isn’t this ivory tower.

You can’t like create a system that gives you that information because it doesn’t then tell you what you’re supposed to do with it.

It doesn’t tell you how you’re supposed to interact with people.

It doesn’t tell you what you’re supposed to do with it.

And I think that I haven’t yet heard of a really positive story where people have tried to do something like that and it’s gone really well.

Because I just think it’s just, I just think like, that kind of, yeah, the real world is messy.

The real world is messy.

So, Hannah, I think just AI will figure out what to do with the data.

It definitely won’t.

AI will know to become our overlords and subjugate us because we can’t take care of ourselves.

Definitely won’t.

I mean, also, right, that was a point like a couple of years ago where everyone, I mean, genuinely, newspaper articles had that attitude of like, I’ll just feed it into the AI and it will be able to predict everything.

This, so this book, actually, this one here, let’s go here, Matt Salkinick, he did this amazing project where he had, I can’t remember exactly how many, but it’s thousands of kids and he had data on them from when they were born, when they were five years old, ten years old, all the way up into the teens and he had everything on them.

He had what their parents did, he had interviews with them, an unimaginable amount, big, big, big data on these kids.

What he did very cleverly is he released the data to the public, anonymized and so on, in stages, and he held back the last stage when they were, I think, 18.

And he asked people all around the world, he said, here’s everything you know about these kids from 0 to 15.

I want you to predict how many of them ended up in trouble, how many of them went on to further education, all of those different kinds of things.

And everyone around the world with their very clever AI and their very clever all of this and the other try to do it.

Would you like to know what came out on top?

Linear regression.

The most basic, basic, basic.

What’s gone before will happen again.

You fit a line to the trend in the data.

And there you have it.

Chuck, in case you didn’t know, linear regression is the…

Fitting a line through the data, they have to put more syllables to that.

So linear regression…

You draw a line.

I think it was.

I think I got that right, guys.

By the way, I would probably fact check that slightly because it may have been legit, whatever.

I may have got a tiny couple of those facts wrong, but whatever.

Yeah, we don’t care.

The sense of it is fine.

It’s a great story.

I’m sticking with that.

So we got to bring this to a close, but we have to have Hannah back on.

Oh, my gosh.

We’ve only just scratched the surface, especially with 11 pages.

Hannah, clearly you’ve triggered interest in our fan base, and they’re going to want more of you as we go forward.

But I think…

Correct me if I’m wrong, Hannah, that one of the great lessons of this is really maybe everyone should have paid attention to their math class, because math will be the foundational forces that define our social cultural existence in this world.

Did I overstate that, Hannah, or not?

Yeah, no, I think that’s definitely true.

I always think that it’s really hard to realize how important this stuff is because it’s invisible.

And I kind of think, you know, with drones, like drones came along and then all of a sudden there were loads of drones everywhere and everyone got really upset.

Now you’ve got to have every license possible to fly a drone.

I always think that, like, if you could see algorithms in the same way that you could see drones, I think people would be a lot more, you know, willing to, well, on it, really.

I think that they’d want to educate themselves a lot more about it.

Or they’d just be really annoyed by algorithms, like they are drones.

So, OK, guys, we got to wrap this up.

Hannah, thank you very much for sharing your wisdom, your insights, and some of it pleasing, some of it scary.

Part of what we need going forward.

Chuck, always good to have you.

Always a pleasure.

All right.

This has been a Cosmic Queries edition on data and algorithms.

We got to call it quits there.

I’m Neil deGrasse Tyson, your personal astrophysicist bidding you as always.

See the full transcript

June 22, 2020 • Season 11 Episode 25

Cosmic Queries – Algorithms and Data, with Hannah Fry

About This Episode

Transcript

In This Episode

Host

Co-Host

Guest

Get the most out of StarTalk!

Episode Topics