Coding large contingency table

View previous topic View next topic Go down

Coding large contingency table

Post  xibalba on Thu Aug 15, 2013 8:28 am

Hello,

I'm trying to code a large contingency table into Stats 101, but I am a total programming noob.  I know there is probably a lot missing from this and I'm wondering if someone can help steer me in the right direction (or give advice if I should be doing this at all!).

Here is the code that I have, but it is not giving the correct results:

urn 9#10 3#9 12#8 0#7 0#6 1#5 0#4 0#3 0#2 0#1 0#0 tay
urn 11#10 14#9 8#8 0#7 0#6 1#5 0#4 0#3 0#2 0#1 0#0 que
urn 20#10 29#9 105#8 0#7 0#6 0#5 0#4 0#3 0#2 0#1 0#0 nix
urn 2#10 0#9 3#8 0#7 0#6 0#5 0#4 0#3 0#2 0#1 0#0 sjd
urn 24#10 19#9 19#8 0#7 0#6 0#5 0#4 0#3 0#2 0#1 0#0 zac
urn 3#10 3#9 4#8 0#7 0#6 0#5 1#4 0#3 0#2 0#1 0#0 ixl
urn 21#10 3#9 1#8 0#7 0#6 0#5 0#4 0#3 0#2 0#1 0#0 src
urn 13#10 2#9 0#8 0#7 1#6 0#5 0#4 0#3 0#2 0#1 0#0 lam
urn 6#10 2#9 0#8 1#7 0#6 0#5 0#4 0#3 0#2 0#1 0#0 tip
urn 90#10 10#9 6#8 0#7 1#6 1#5 1#4 0#3 0#2 0#1 0#0 may
urn 1#10 0#9 4#8 0#7 0#6 3#5 0#4 0#3 0#2 0#1 0#0 lmn
urn 14#10 99#9 187#8 0#7 0#6 2#5 0#4 1#3 0#2 0#1 0#0 can
urn 28#10 0#9 0#8 0#7 0#6 0#5 0#4 0#3 0#2 0#1 4#0 tic
REPEAT 10000
sample 25 tay tay$
sample 34 que que$
sample 154 nix nix$
sample 5 sjd sjd$
sample 62 zac zac$
sample 11 ixl ixl$
sample 25 src src$
sample 16 lam lam$
sample 9 tip tip$
sample 109 may may$
sample 8 lmn lmn$
sample 303 can can$
sample 32 tic tic$
mean tay$ t
mean que$ q
mean nix$ n
mean sjd$ s
mean zac$ zc
mean ixl$ i
mean src$ sr
mean lam$ l
mean tip$ tp
mean may$ m
mean lmn$ lm
mean can$ c
mean tic$ tc
subtract t q n s zc i sr l tp m lm c tc d
score d z
end
histogram  z
count z <= 0 d5
divide d5 10000 dddd
print dddd
percentile z (2.5 97.5) k

print k

I know this is probably a coding disaster, so any help would be appreciated.  Thanks so much in advance!

xibalba

Posts : 5
Join date : 2013-08-15

View user profile

Back to top Go down

Re: Coding large contingency table

Post  John on Thu Aug 15, 2013 8:59 am

Xibalba,

Please give more detail on what you're trying to do. Without a clear and detailed description of the purpose and expected output it's not possible to evaluate whether a program is correct.

One obvious question is raised by the SUBTRACT command near the end of the program. What is it supposed to be doing?

John

Posts : 11
Join date : 2011-09-06

View user profile

Back to top Go down

Re: Coding large contingency table

Post  xibalba on Thu Aug 15, 2013 9:27 am

Hi John,

Thanks for your message!  The code is a modification of the program CABBIES (I think this was included in the sample software sometime earlier).  Here is the original 2x2 chi-square and code that I attempted to modify:
   A   B
X 50 11
Y 29  9

urn 50#1 23#0 pit
urn 11#1 9#0 chi
REPEAT 100000
sample 73 pit pit$
sample 20 chi chi$
mean pit$ p
mean chi$ c
subtract p c d
score d z
end
histogram  z
count z <= 0 d5
divide d5 100000 dddd
print dddd
percentile z (2.5 97.5) k

print k
 
I'm trying to resample a larger 11x13 contingency table in the same manner to do an omnibus/global chi-square test between the variables.  Most cell values are low or non-existant, so I'm not sure if it will work anyway (the categories can't really be conflated).  This is why I thought bootstrapping might work better.  Here are the data (sorry about the spacing):
     
      a b c d e f g h i j k
tay 9 3 12 0 0 1 0 0 0 0 0 
que 11 14 8 0 0 1 0 0 0 0 0 
nix 20 29 105 0 0 0 0 0 0 0 0
sjd 2 0 3 0 0 0 0 0 0 0 0 
zac 24 19 19 0 0 0 0 0 0 0 0
ixl 3 3 4 0 0 1 0 0 0 0 0
src 21 3 1 0 0 0 0 0 0 0 0
lam 13 2 0 0 1 0 0 0 0 0 0
tip 6 2 0 1 0 0 0 0 0 0 0 
may 90 10 6 0 1 1 1 0 0 0 0
lmn 1 0 4 0 0 3 0 0 0 0 0 
can 14 99 187 0 0 2 0 1 0 0 0
tic 28 0 0 0 0 0 0 0 0 0 4

Thanks again for any help, I really appreciate it!!

xibalba

Posts : 5
Join date : 2013-08-15

View user profile

Back to top Go down

Re: Coding large contingency table

Post  John on Thu Aug 15, 2013 11:15 am

That CABBIE program doesn't really do a Chi-square evaluation. It uses a simpler statistic (difference of means) as an alternative to Chi-square. I don't think it can be applied to your problem.

If you want to try a Chi-square evaluation, open the Subroutine Browser (Window>Show Subroutine Browser) and look at the description of the CHISQUARE_TABLE subroutine. It works for any size table (although I've never tried it with one as big as yours).

John

Posts : 11
Join date : 2011-09-06

View user profile

Back to top Go down

Re: Coding large contingency table

Post  xibalba on Thu Aug 15, 2013 1:39 pm

Thank you so much John!  I'll try to give it a whirl and see if it works!

xibalba

Posts : 5
Join date : 2013-08-15

View user profile

Back to top Go down

Re: Coding large contingency table

Post  xibalba on Tue Aug 20, 2013 1:55 pm

Good afternoon!  So I attempted to code the data using the subroutine chi-square trials.  It appears that I am getting close, but I am still not getting the correct chi-square output at the end.  Here is the code that I modified:
--------------------------------------
DATA (9 3 12 0 0 1 0 0 0 0 0) firstRow
DATA (11 14 8 0 0 1 0 0 0 0 0) secondRow
DATA (20 29 105 0 0 0 0 0 0 0 0) thirdRow
DATA (2 0 3 0 0 0 0 0 0 0 0) fourthRow
DATA (24 19 19 0 0 0 0 0 0 0 0) fifthRow
DATA (3 3 4 0 0 1 0 0 0 0 0) sixthRow
DATA (21 3 1 0 0 0 0 0 0 0 0) seventhRow
DATA (13 2 0 0 1 0 0 0 0 0 0) eighthRow
DATA (6 2 0 1 0 0 0 0 0 0 0 ) ninthRow
DATA (90 10 6 0 1 1 1 0 0 0 0) tenthRow
DATA (1 0 4 0 0 3 0 0 0 0 0 ) eleventhRow
DATA (14 99 187 0 0 2 0 1 0 0 0) twelfthRow
DATA (28 0 0 0 0 0 0 0 0 0 4) thirteenthRow
COPY 1000 numberOfTrials

'Compute Observed Chi-Square value:

CHISQUARE_TABLE observedValues colTotals successProbs predictedFrequencies observedChiSquare firstRow secondRow thirdRow fourthRow fifthRow sixthRow seventhRow eighthRow ninthRow tenthRow eleventhRow twelfthRow thirteenthRow

'Print the contingency table and the observed Chi-square value:

PRINT
PRINT "Contingency Table"
PRINT "-----------------"
PRINT_CHISQUARE_TABLE observedValues predictedFrequencies 13
PRINT
PRINT observedChiSquare

'Now use many trials to generate the appropriate Chi-square distribution:

CHISQUARE_TRIALS numberOfTrials colTotals successProbs predictedFrequencies results
HISTOGRAM percent binsize 0.5 results

'Now determine, from the distribution (in the results vector),
'the probability of equaling or exceeding that observed Chi-Square statistic:

COUNT results >= observedChiSquare successCount
probability = successCount / numberOfTrials
PRINT probability
-----------------------------------------


And here is the output:

Contingency Table
-----------------
row  1 :  (9     3     12   0     0     1     0     0     0     0     0    ) 25   
predicted: (7.63 5.8   11   0.03 0.06 0.28 0.03 0.03 0     0     0.13 )

row  2 :  (11   14   8     0     0     1     0     0     0     0     0    ) 34   
predicted: (10.38 7.89 14.96 0.04 0.09 0.39 0.04 0.04 0     0     0.17 )

row  3 :  (20   29   105   0     0     0     0     0     0     0     0    ) 154  
predicted: (47   35.73 67.78 0.19 0.39 1.75 0.19 0.19 0     0     0.78 )

row  4 :  (2     0     3     0     0     0     0     0     0     0     0    ) 5    
predicted: (1.53 1.16 2.2   0.01 0.01 0.06 0.01 0.01 0     0     0.03 )

row  5 :  (24   19   19   0     0     0     0     0     0     0     0    ) 62   
predicted: (18.92 14.39 27.29 0.08 0.16 0.7   0.08 0.08 0     0     0.31 )

row  6 :  (3     3     4     0     0     1     0     0     0     0     0    ) 11   
predicted: (3.36 2.55 4.84 0.01 0.03 0.12 0.01 0.01 0     0     0.06 )

row  7 :  (21   3     1     0     0     0     0     0     0     0     0    ) 25   
predicted: (7.63 5.8   11   0.03 0.06 0.28 0.03 0.03 0     0     0.13 )

row  8 :  (13   2     0     0     1     0     0     0     0     0     0    ) 16   
predicted: (4.88 3.71 7.04 0.02 0.04 0.18 0.02 0.02 0     0     0.08 )

row  9 :  (6     2     0     1     0     0     0     0     0     0     0    ) 9    
predicted: (2.75 2.09 3.96 0.01 0.02 0.1   0.01 0.01 0     0     0.05 )

row  10 :  (90   10   6     0     1     1     1     0     0     0     0    ) 109  
predicted: (33.26 25.29 47.97 0.14 0.27 1.24 0.14 0.14 0     0     0.55 )

row  11 :  (1     0     4     0     0     3     0     0     0     0     0    ) 8    
predicted: (2.44 1.86 3.52 0.01 0.02 0.09 0.01 0.01 0     0     0.04 )

row  12 :  (14   99   187   0     0     2     0     1     0     0     0    ) 303  
predicted: (92.47 70.31 133.35 0.38 0.76 3.44 0.38 0.38 0     0     1.53 )

row  13 :  (28   0     0     0     0     0     0     0     0     0     4    ) 32   
predicted: (9.77 7.42 14.08 0.04 0.08 0.36 0.04 0.04 0     0     0.16 )

Col. Tot.: (242   184   349   1     2     9     1     1     0     0     4    ) 793  

observedChiSquare: NaN
probability: 0.0

Thus it appears that it is getting very close, but I think I still have a mistake somewhere that isn't giving me the correct chi-square value in the computation observedChiSquare and because of that I'm not getting a correct histogram. I'm not sure if this is because the table is gigantic or if I'm just overlooking something.  Any thoughts would be greatly appreciated.  Thanks so much!

xibalba

Posts : 5
Join date : 2013-08-15

View user profile

Back to top Go down

Re: Coding large contingency table

Post  John on Tue Aug 20, 2013 6:00 pm

The NaN ("Not a Number") comes about because of one or more divisions by zero that is taking place during the CHISQUARE command which is invoked in the CHISQUARE_TABLE subroutine. This happens because the predictedFrequencies vector has some zero values and these are used as divisors in the process of computing the CHISQUARE value. So, predicted values of zero are not allowed.

How does this happen? Well the 9th and 10th data item in each row are zero, so the column totals for those is zero. That means that those items have zero expectation of showing up and should be removed.

Here's the table after removing the 9th and 10th item from each row:

DATA (9 3 12 0 0 1 0 0 0) firstRow
DATA (11 14 8 0 0 1 0 0 0) secondRow
DATA (20 29 105 0 0 0 0 0 0) thirdRow
DATA (2 0 3 0 0 0 0 0 0) fourthRow
DATA (24 19 19 0 0 0 0 0 0) fifthRow
DATA (3 3 4 0 0 1 0 0 0) sixthRow
DATA (21 3 1 0 0 0 0 0 0) seventhRow
DATA (13 2 0 0 1 0 0 0 0) eighthRow
DATA (6 2 0 1 0 0 0 0 0 ) ninthRow
DATA (90 10 6 0 1 1 1 0 0) tenthRow
DATA (1 0 4 0 0 3 0 0 0) eleventhRow
DATA (14 99 187 0 0 2 0 1 0) twelfthRow
DATA (28 0 0 0 0 0 0 0 4) thirteenthRow

Here's the final result:

observedChiSquare: 740.8868194124383
probability: 0.0

John

Posts : 11
Join date : 2011-09-06

View user profile

Back to top Go down

Re: Coding large contingency table

Post  xibalba on Tue Aug 20, 2013 7:06 pm

Oh my!  Thank you so much John!  That explains quite a bit and I really appreciate your help.  I completely "blanked" that I had all those zeros in there.  Best wishes and thanks again for the excellent feedback!

xibalba

Posts : 5
Join date : 2013-08-15

View user profile

Back to top Go down

Re: Coding large contingency table

Post  John on Tue Aug 20, 2013 7:29 pm

You are most welcome. Glad I could help.

John

John

Posts : 11
Join date : 2011-09-06

View user profile

Back to top Go down

Re: Coding large contingency table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum