So what we have is nine bits for the hypothetical
identical floating point numbers. First bit is for the sign of the number, second
bit is for the sign of the exponent, next three bits are for the magnitude of the exponent
and the last four bits are for the magnitude of the mantissa. So what we want to be able to do is to be
able to take this number eleven point eight base ten and write it in this floating point
format which follows that convention. In order to be able to do that the first thing
which we have to do is to be able to see is that hey how we can write eleven in base two
and how can we write zero point eight in base two. So eleven in base 10 to base 2 is 1011 base
2. You can do this as home work because the previous
vidoe covers that already. Zero point eight base ten will be zero radix
point one one zero zero one and keeps on going base two and you can also do this as homework
as it was covered in the previous video. So if we want to see eleven point eight base
ten written as base two number then it is one zero one one that is equivalent of eleven
then radix point and then we will have the equivalence of bit of point eight in base
ten which is one one zero zero one to the base two that we just showed. So once we have that what we want to have
is we have to take this radix point and move it here because we only want one non zero
digit before the radix point so this is one radix point zero one one one one zero zero
one, base two times two to the power three. The reason why two to the power three now
is because the radix point was moved to the left by three places. So what we are going to do is do this in two
stages. We want to first see that how many bits we
want to take of the mantissa since there are only four bits for the mantissa. We can only use these first four bits one
zero one one and base two and we are going to forget about these because these cannot
be represented because we have only four bits for the mantissa and with two to the power
three. Now two to the power three needs to be the
three part needs to be written in base two so three will be one one base two. So this three which we have in base ten can
be given as one one base two. But again we want to make a small change we
want to say that this multiply by two to the power zero one one base two. The reason why we are doing zero one one rather
than one one base two is because we have three bits available for the magnitude of the exponent. So let�s repeat this. We have eleven point eight base ten is now
written as one point zero one one one base two times two to the power zero one one base
two. And now what we will do is we have to assign
it to the nine bits of this hypothetical floating point and we want to see how we are going
to go about doing that. So here were the nine bits. This is for the sign of the number, this is
for the sign of the exponent, these three are for the magnitude of the exponent and
these last four are for the magnitude of the mantissa. So what we are going to do is we are going
to start now filling in these places for the bits with zeros and ones. The sign of the number is positive so we will
put zero here. The sign of the exponent is positive so we
want to put zero there. The magnitude of the exponent is zero one
one and then we have these four bits to be put in the magnitude of the mantissa zero
one one one. We don�t take care of this because this
is already assumed because in order to put a non-zero digit before the radix point you
need a non-zero number and the only non-zero number which is real one in binary format
is one so we don�t need to represent it. It�s there but we don�t need to represent
it because it will always be one. So this is the representation of the number
eleven point eight base ten in this hypothetical nine bit floating point representation. Now if somebody says that hey this is the
representation how would you write this number is base two you say okay what I want to do
is I want to first say just plus because of the fact that the sign of the number is positive
then I am going to write one then dot then what I am going to do is I am going to write
the four digits of the mantissa zero one one one one base two times to the power then I
will write three bits of the magnitude of the exponent zero one one base two and then
I have the sign of the exponent which is positive plus. So go and see what this is equivalent to in
base ten and you will see that this is not equivalent to eleven point eight in base ten
so that difference between this number and this number will tell you what the round off
error is caused by using this hypothetical nine bit word for our floating point representation
and that is the end of this segment.


6 Comments

Burton Korten · June 17, 2017 at 12:32 am

good video. i like it better tho when your in front of the white board tho. the audio is a little shoty also. however your videos saved me when i took numerical analysis in college so for that i thank you

keep up the good work

Aymen · June 19, 2017 at 7:18 pm

please explain full pivoting.

Talha Asif · August 22, 2017 at 9:54 am

LOL

Gardening and Cooking · January 8, 2019 at 7:21 am

Very poor sound volume unfortunately. What could have been the fantastic video is unfortunately frustrating to try and listen and make sense. Could you please make another video

ruanan manana · March 6, 2019 at 1:50 pm

Thank you soooooooo much!!!

Peter Kuzmin · October 8, 2019 at 5:56 pm

ok but in practice what is the usual number of bits for a floating point number?

Leave a Reply

Your email address will not be published. Required fields are marked *