

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 
Browse Open Actuarial Jobs 

Thread Tools  Search this Thread  Display Modes 
#271




A few questions on the Actex Sample Exam, task 8
1) How come VehicleType was not binarized along with the other factor variables? 2) When binarizing, when do I use Fullrank = TRUE, and when do i use Fullrank = FALSE and manually Null out the base variable? 3) in the Interpretation explanation, how come we are comparing each level of VAgeCat (4, 5, 6) to the levels of "Age Groups 03". Shouldn't it just be the base Age Group 0? thanks in advance. 
#272




I practiced June 2019 PA exam with shorter response again and have not seen a lot comments from your exam solution analysis regarding how to interpret stepAIC() output within each stepwise selected model regardless of AIC or BIC criteria used below. I want to confirm with you if my understanding summarized from what I saw in model output is correct below:
For forward selection method, we start with the null model and add one predictor at a time to the model. If the resulting AIC/BIC increases or does not change next to "+predictor_name" compare to baseline with symbol "", then we should not consider keeping the predictor since there's no model improvement by adding the predictor. "Traffic Control" is an example in that exam with no change in AIC/BIC after being added to the model. If the resulting AIC/BIC decreases, then we should consider keeping the predictor since there's some model improvement for keeping the predictor. For backward selection method, we start with the full model and drop one predictor at a time from the model. If the resulting AIC/BIC increases or does not change next to " predictor_name" compare to baseline with symbol "", then we should not consider dropping the predictor since there's no model improvement by dropping the predictor. If the resulting AIC/BIC decreases, then we should consider dropping the predictor since there's some model improvement without the predictor. Thanks. Last edited by windows7forever; Yesterday at 02:22 AM.. 
#273




Quote:
The model you described is really just an OLS and is using an identity link, not the log link like you implied. To fit a true GLM I would use a nontransformed target variable modeled with a gamma distribution and a log link. You would then subtract offset_var (as it appears in the dataset, not log(offset_var)) from the model equation and set offset = log(offset_var) ("to make it on the same scale as the linear predictor."  manual). See pages 226227 of the manual for a better description and example. To answer your question, I don't believe we should ever subtract log(offset_var) in the model equation. What I'm not sure about is if we should include offset = log(offset_var) or not in the case you described: logtransformed target variable with a normal dist. and identity link. That is, should the scale of the offset variable always match the link function, regardless of the transformation of the target variable. If that's true, then my answer would be scenario (1) is correct. 
#274




R code error for running mean of predictore by different level
I tried to get the mean of predictor by different level, but always have a code error message: "Error: n() should only be called in a data context".
Can anyone help me figure out what it is the code issue? Or is there a alternative way to check the mean of predictor by different level? 12/2019 Exam: library(dplyr) df %>% group_by(marital_status) %>% summarise( zeros = sum(value_flag == 0), ones = sum(value_flag == 1), n = n(), proportion = mean(value_flag) ) 6/2019 Exam: library(dplyr) for (i in vars) { print(i) x < dat %>% group_by_(i)%>%summarise(mean=mean(log(Crash_Score )), median=median(log(Crash_Score)), n = n()) print(x) } 
#275




Quote:
Last edited by ActuaryStudent22; Yesterday at 06:43 PM.. 
#276




@ambroselo your method of the one standard rule for reducing tree complexity differs from the modules and am wondering which is correct. On slide 48 of module 7, the one standard error rule is applied by adding the xerror column and xstd column and selecting the CP corresponding to the smallest value of this sum. However, this differs from your approach applied for practice exam 1. If you were to use the module’s one standard error approach, by (dt$cptable[, "xerror"] + dt$cptable[, "xstd"]) you would get that the number of splits associated with the optimal complexity parameter is 7 (compared to your solution of split 1). So, what is the difference and which method should I use?
Last edited by street5990; Today at 10:05 AM.. 
#278




Dear Dr Ambrose,
I was trying the Hospital Readmissions sample exam and I kept running into this error where my model is rank deficient. Of course, in later steps after the stepAIC, it was all good. But I am just wondering, 1. Why is my model rank deficient  is it because of the interaction variable? PS: I used ER and Log HCC Riskscore, instead of Gender and Race given in the solution. 2. In general, if I face such a situation in the exam, how should I deal with it? Appreciate your kind help! I attached the screenshot to show you what model I was running. Thanks! 
#279




On your sample exam you state the following…
“For values of the cp parameter greater than the value supplied in the function call, the algorithm divides the training data into 10 folds, training the tree on all but one fold, and computing the Rsquared on the heldout fold as the crossvalidation error” I am confused because I thought RSS was used as the impurity measure for regression trees. On page 308 of your manual you state "the six prediction errors are squared, summed, and divided by the total sum of errors to get xerror" indicating RSS is calculated on the heldout observation not Rsquared. Can you please explain the difference between where the algorithm uses Rsquared versus RSS? Last edited by mnm4156; Today at 02:01 PM.. 
Thread Tools  Search this Thread 
Display Modes  

