So what we have is nine bits for the hypothetical

identical floating point numbers. First bit is for the sign of the number, second

bit is for the sign of the exponent, next three bits are for the magnitude of the exponent

and the last four bits are for the magnitude of the mantissa. So what we want to be able to do is to be

able to take this number eleven point eight base ten and write it in this floating point

format which follows that convention. In order to be able to do that the first thing

which we have to do is to be able to see is that hey how we can write eleven in base two

and how can we write zero point eight in base two. So eleven in base 10 to base 2 is 1011 base

2. You can do this as home work because the previous

vidoe covers that already. Zero point eight base ten will be zero radix

point one one zero zero one and keeps on going base two and you can also do this as homework

as it was covered in the previous video. So if we want to see eleven point eight base

ten written as base two number then it is one zero one one that is equivalent of eleven

then radix point and then we will have the equivalence of bit of point eight in base

ten which is one one zero zero one to the base two that we just showed. So once we have that what we want to have

is we have to take this radix point and move it here because we only want one non zero

digit before the radix point so this is one radix point zero one one one one zero zero

one, base two times two to the power three. The reason why two to the power three now

is because the radix point was moved to the left by three places. So what we are going to do is do this in two

stages. We want to first see that how many bits we

want to take of the mantissa since there are only four bits for the mantissa. We can only use these first four bits one

zero one one and base two and we are going to forget about these because these cannot

be represented because we have only four bits for the mantissa and with two to the power

three. Now two to the power three needs to be the

three part needs to be written in base two so three will be one one base two. So this three which we have in base ten can

be given as one one base two. But again we want to make a small change we

want to say that this multiply by two to the power zero one one base two. The reason why we are doing zero one one rather

than one one base two is because we have three bits available for the magnitude of the exponent. So let�s repeat this. We have eleven point eight base ten is now

written as one point zero one one one base two times two to the power zero one one base

two. And now what we will do is we have to assign

it to the nine bits of this hypothetical floating point and we want to see how we are going

to go about doing that. So here were the nine bits. This is for the sign of the number, this is

for the sign of the exponent, these three are for the magnitude of the exponent and

these last four are for the magnitude of the mantissa. So what we are going to do is we are going

to start now filling in these places for the bits with zeros and ones. The sign of the number is positive so we will

put zero here. The sign of the exponent is positive so we

want to put zero there. The magnitude of the exponent is zero one

one and then we have these four bits to be put in the magnitude of the mantissa zero

one one one. We don�t take care of this because this

is already assumed because in order to put a non-zero digit before the radix point you

need a non-zero number and the only non-zero number which is real one in binary format

is one so we don�t need to represent it. It�s there but we don�t need to represent

it because it will always be one. So this is the representation of the number

eleven point eight base ten in this hypothetical nine bit floating point representation. Now if somebody says that hey this is the

representation how would you write this number is base two you say okay what I want to do

is I want to first say just plus because of the fact that the sign of the number is positive

then I am going to write one then dot then what I am going to do is I am going to write

the four digits of the mantissa zero one one one one base two times to the power then I

will write three bits of the magnitude of the exponent zero one one base two and then

I have the sign of the exponent which is positive plus. So go and see what this is equivalent to in

base ten and you will see that this is not equivalent to eleven point eight in base ten

so that difference between this number and this number will tell you what the round off

error is caused by using this hypothetical nine bit word for our floating point representation

and that is the end of this segment.

## 6 Comments

## Burton Korten · June 17, 2017 at 12:32 am

good video. i like it better tho when your in front of the white board tho. the audio is a little shoty also. however your videos saved me when i took numerical analysis in college so for that i thank you

keep up the good work

## Aymen · June 19, 2017 at 7:18 pm

please explain full pivoting.

## Talha Asif · August 22, 2017 at 9:54 am

LOL

## Gardening and Cooking · January 8, 2019 at 7:21 am

Very poor sound volume unfortunately. What could have been the fantastic video is unfortunately frustrating to try and listen and make sense. Could you please make another video

## ruanan manana · March 6, 2019 at 1:50 pm

Thank you soooooooo much!!!

## Peter Kuzmin · October 8, 2019 at 5:56 pm

ok but in practice what is the usual number of bits for a floating point number?