Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Property - Casualty / General Insurance
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions



Reply
 
Thread Tools Search this Thread Display Modes
  #41  
Old 05-01-2019, 09:23 AM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 66,138
Default

I think the more important question regarding which package to use is "how long will you need to keep the code around?".

If for quite some time (> 1 year w/o additional follow up/work), then I would opt for the one that is far more likely to be "clean" in understanding its syntax wrt other languages that (future) users are going to be most likely familiar with (e.g., dplyr has a rather unique syntax with the %>% conjunction).
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #42  
Old 05-01-2019, 10:21 AM
Tacoactuary's Avatar
Tacoactuary Tacoactuary is offline
Member
CAS
 
Join Date: Nov 2014
Location: Des Moines, IA
College: Vanderbilt, UIUC
Favorite beer: Yazoo Sue
Posts: 1,592
Default

Quote:
Originally Posted by Vorian Atreides View Post
I think the more important question regarding which package to use is "how long will you need to keep the code around?".

If for quite some time (> 1 year w/o additional follow up/work), then I would opt for the one that is far more likely to be "clean" in understanding its syntax wrt other languages that (future) users are going to be most likely familiar with (e.g., dplyr has a rather unique syntax with the %>% conjunction).
Are you suggesting that the pipe should be avoided? I hope not.
__________________
ACAS 7 8 9 FCAS
Reply With Quote
  #43  
Old 05-01-2019, 11:54 AM
examsarehard examsarehard is offline
Member
CAS
 
Join Date: May 2011
Posts: 598
Default

The pipe isn't actually unique to dplyr, it's from the magrittr package. You can use the pipe with functions outside the tidyverse, but most of its readability benefits are realized when the functions order their inputs in a specific way, as the tidyverse does.

Fun fact: Hadley didn't even realize the pipe operator even existed when he wrote ggplot2. The "ggplot1" package is actually more compatible with the pipe operator than ggplot2 is, but it's too late to change everything now.
Reply With Quote
  #44  
Old 05-01-2019, 01:33 PM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 66,138
Default

Quote:
Originally Posted by Tacoactuary View Post
Are you suggesting that the pipe should be avoided? I hope not.
Absolutely not. My comment relates to how easy would it be for someone to pick up the code when it is uncertain what their knowledge set might be.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #45  
Old 05-27-2019, 03:52 PM
bjc2142's Avatar
bjc2142 bjc2142 is offline
Member
CAS
 
Join Date: Aug 2014
Location: Boston
Studying for Python
Favorite beer: Asahi Draft
Posts: 1,193
Default

Is tidyverse/dplyr way to go in R? Both DataCamp and JHU coursera course seem to use dplyr in their lectures. I'm currently learning R from intro level course that uses datatable and starting to wonder if I would end up having to relearn syntax when I move on to intermediate level courses from other websites
__________________
FCAS - Hawaii 2019
Reply With Quote
  #46  
Old 05-27-2019, 08:19 PM
Avi's Avatar
Avi Avi is offline
Wiki Contributor
Site Supporter
Site Supporter
CAS AAA
 
Join Date: Aug 2002
Location: NY
Studying for the rest of my life.
College: Alumnus - Queens College - CUNY
Favorite beer: Stone Ruination IPA
Posts: 14,212
Blog Entries: 3
Default

Quote:
Originally Posted by bjc2142 View Post
Is tidyverse/dplyr way to go in R? Both DataCamp and JHU coursera course seem to use dplyr in their lectures. I'm currently learning R from intro level course that uses datatable and starting to wonder if I would end up having to relearn syntax when I move on to intermediate level courses from other websites
It depends on what you want to do. Some of us older curmudgeons, while we appreciate what Hadley et al. have done for the R universe, look with askance on the proliferation of "tidyspeak".



The benefits of the tideyverse are a cohesive unit of programs which work well together and share the same philosophy and mainly the same syntax.


The detriment, of course, is that there are almost always better, or more precise, ways of accomplishing tasks which are almost always faster and more efficient. In the case of data.table in particular, the only reason why I would use dplyr is when I MUST deal with data out-of-memory. Then, if I cannot use SparkR on its own, I could use the dplyr verbs and the sparklyr backend. Other than that, I find data.table to be, for data of appropriate size, faster, more compact, and easier to write then the corresponding dplyr verb chains.



The other huge detriment is bloat. Hadley's philosophy for years is based on the Unix philospophy of having simple functions that do one thing well. So there are many, many, many small packages, all relying on each other, to make dplyr or anything in the tidyverse work. You may want one function from one package and end up having to install 15 packages. It creates HUGE package bloat. I'm from an older generation where you write packages with as few dependencies as possible (outside of base and MAYBE the recommendeds).



The other huge issue is just how much of a fan of magrittr-style piping are you? You can use piping with anything, but the tidyverse has full-bodiedly embraced that style. As an older fogey, I've tested both ways and there is some overhead, albeit minimal, with magrittr piping.


One thing I've heard from more than one educator is that the cohesiveness of tidyverse combined with magrittr piping makes the code easier for students (of both statistics and computing) to understand, and so they have moved in that direction for teaching statistics, as it allows them to focus more on application and meaning than coding. I'm probably going back for a Masters in Data Science, and the syllabus I'm looking at has one course which partially is just intro to tidyverse. I actually noted this to the departmental chair (did I mention I was a curmudgeon?) and he agreed, but gave me a similar answer to what I wrote above.


So, tl;dr - if you know nothing, or very little, there s value to learning tidyverse methods (drinking the hadley Kool-aid, I call it ). But if you have some R experience, or are interested in having the widest variety of options, don't live only in the hadleyverse.


AND, if you're dealing with tabular data that fits in memory, data.table is far better than dplyr for munging, IMO.
__________________
All scientists defer only to physicists
Physicists defer only to mathematicians
Mathematicians defer only to G-d!

--with apologies to Dr. Leon Lederman
Reply With Quote
  #47  
Old 05-28-2019, 12:37 AM
bjc2142's Avatar
bjc2142 bjc2142 is offline
Member
CAS
 
Join Date: Aug 2014
Location: Boston
Studying for Python
Favorite beer: Asahi Draft
Posts: 1,193
Default

Thanks, I'm definitely interesting in trying and learning both if needed. For any personal projects and ad-hoc analysis, I guess I can use whatever. I'm wondering if dplyr would be bad choice for team project in typical DS collaborative environment. If I'm using one and the majority is using the other, that would be a pain.
__________________
FCAS - Hawaii 2019
Reply With Quote
  #48  
Old 05-28-2019, 08:31 AM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 66,138
Default

Given the fact that R is open source, you may as well get use to the idea that things are going to "change" as people develop improved tools & packages.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
  #49  
Old 06-12-2019, 09:16 AM
bjc2142's Avatar
bjc2142 bjc2142 is offline
Member
CAS
 
Join Date: Aug 2014
Location: Boston
Studying for Python
Favorite beer: Asahi Draft
Posts: 1,193
Default

I wrote a LinkedIn article about my recommended courses for learning DS as an actuary. Selections are purely subjective as they are based on my experience and observations mostly. It's for Python only and I referenced this thread for people interested in going R route.

Thanks OP! You made me pick Python over R and I'm really enjoying it I think having some experience with C++ from 10 years ago still makes me more comfortable with Python syntax than R. R just feels very weird to me.
__________________
FCAS - Hawaii 2019

Last edited by bjc2142; 06-12-2019 at 09:26 AM..
Reply With Quote
  #50  
Old 06-12-2019, 09:30 AM
Vorian Atreides's Avatar
Vorian Atreides Vorian Atreides is offline
Wiki/Note Contributor
CAS
 
Join Date: Apr 2005
Location: As far as 3 cups of sugar will take you
Studying for ACAS
College: Hard Knocks
Favorite beer: Most German dark lagers
Posts: 66,138
Default

Thanks for sharing that bjc. I've left you a comment as well.
__________________
I find your lack of faith disturbing

Why should I worry about dying? Itís not going to happen in my lifetime!


Freedom of speech is not a license to discourtesy

#BLACKMATTERLIVES
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 04:50 AM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.82181 seconds with 9 queries