So our story so far we’ve thought kids and what’s up city we’ve thought adult we’ve had meet-ups we’ve had sessions and you can pretty much read up reports from last year hoping to you know release one for this year soon you can also follow us on twitter and also you know visit our websites to you know to learn more so how can you support us so i’m currently we have a link to our domain box and github sponsors link i guess in the description so you can check that out but otherwise if you want to get in touch with us uh we are still on a portal school of ai sorry just um okay let’s just give a sec so we are on the protagon school of ai gmail.com.
Excuse me yeah so you can you know check us out you can send us an email we respond as fast as we can um but otherwise yeah you can find ways to support us buy books that we are and we are uh affiliates with amazon please you know purchase the books we have top best selling books from our experts who have come um to to the platform please help purchase those books and so that we can get a commission from amazon to keep doing what we do most of these things are all volunteer sponsored.
It will be helpful if you can help donate cash and your time resources wherever you can really help the community grow and you know help us keep doing more keep improving and keep chasing our goal our mission our target yeah so where should we go from here of course we expect that you join an air learning community in your region because we advocate for community driven learning a lot so you know if you’re if you’ve not joined any community ai learning community yet you know join you know find some in your area if you have one.
If you don’t have one you don’t have an air learning community please create one start once you know start engage people build a network and you know help people really help help people gain access to the ai space through this community driven learning method so it’s absolutely crucial thank you um if you do just that so explore more on the topic for today’s session so cloud ml you know explore more cloud ai and all of these stands are absolutely crucial and um you know get started if you haven’t started learning ml yet if you haven’t started doing ml.
Yeah now it’s the best time perhaps for you to you know get started you know start to start i mean yeah you know there are a lot of um difficulties around there but with the community and with people willing to help mentor and guide you you know it’s easier for for you to you know the barrier of entry is pretty much lower than than it usually is so ask questions on twitter get in touch with experts it’s a friendly community like i said and you know the general ai communism members there so you can also join us we have lots of programs that will engage our members coming up soon so we we are still i’m trying to make sure that we’re planning for these program sustainabilities so you can you know join us on our slack and you can also join us on you know you can follow us on twitter which will be very useful as well all right.
I think we’re done and um see you in the session because i think we’ve explained a lot about this session with experts so see you in session bye hello and welcome i’m lynn langet and this talk is cloud practices for machine learning i’ve made the slides publicly available and i’ll give you the link at the end of this talk so the topic is expensive and i really want to during our time together lay out both a plan and information so that you can go through this plan and have success with cloud for ml.
Machine learning and the way i see it is there are certain percentages almost like the battery life on your phone and a common situation i see when people are getting started with cloud ml is they want to skip steps and that leads in my experience to ineffective results so we’re going to take this very systematically and we’re going to start with data and questions then look at development environment and cloud environment then we’re going to look at what i call hello ml.
Hello world for machine learning on cloud then data samples and security model quality and scaling and reproducing so in addition to resources i’m going to talk about some of my experience as a production cloud ml architect in particular i’ve been working in bioinformatics so genomic sequencing so speaking of that it’s important to start with the business problem that you’re trying to solve ml and cloud are both moving really really quickly and there’s a lot of really exciting technology that we as technologists can allow to lead our thinking for example something that.
I’ve been exploring purely from a learning standpoint rather than from a business standpoint is a machine learning algorithm that’s very advanced it’s designed for implementing as it says here hybrid quantum classical neural nets really doesn’t get more advanced than that so you have quantum computing and neural nets and this is uh the open source tensorflow now this is shown running in a learning environment this is actually cloud ml this is running on google colab which is a jupiter jupiter-style notebook that’s running on google cloud infrastructure and what i find is a lot of people that are trying to work with cloud ml myself included sometimes get distracted by some of these new technologies.
When they go to build they don’t focus on the business problem and it’s extremely important to deliver value with cloud ml solutions like any software solutions that you first focus on solving the business problem and you select all the other parts and pieces later so for this particular case although you might be thinking about and this is just a little snippet of the code working with tf or tensorflow in the keras library and working with quantum and everything else it’s really not the best place to start the best place to start is what is the business problem that you’re trying to solve so one of the problems that’s very timely that my company has been involved with is helping biologists do research in finding solutions for covid treatment and for vaccine research and again very compelling problem problem comes first the machine learning.
Comes second that’s really the important thing to to think about here so when you are thinking about your problem domain you want to start not with your machine learning algorithm not with your tensorflow or you know whether using regression or something out of scikit-learn or writing your own algorithm you need to start with the data and you know starting classic machine learning you’re going to have all different types of data probably you’re going to start on your laptop maybe from some experimental research and you need to think about how frequently that data is going to be updated now in this fast-moving world of cloud ml in in in my consultancy for for covid we’re getting data in on a you know basically almost daily basis the thing that drives this or moves us to cloud is the amount of data.
One of my clients which is the broad institute at mit and harvard has publicly stated that they are putting in this is pre-covered this is for their cancer genomic research to the google cloud 17 terabytes per day so this is moving our data to the cloud because it no longer fits not only on a laptop it also doesn’t fit on our university research cluster or any other on-prem resources but i am getting a little ahead of ourselves myself before i talk about cloud scale you have to go to the basics you have to think about what is your data and what is the quality of your data and what data should be included because this is extremely important in any data modeling but more.
In machine learning and even more so in cloud machine learning so just a thought exercise picture a triangle think about something simple that we probably all would have a similar representation right well wrong i was just drawing the other day and if you think about it something as commonly understood as a triangle has many many different forms and data and this is extremely important when you’re working with any machine learning model to consider the applicability the quality of the data now you want to start with sample data and in some domains this is easy sometimes you can just get a text file find a text file but in other domains like the one.
I’m working in genomics it can be really difficult um i actually made a repo up on github and i have all these linked um and inside of here i have sample data and any of you could use this so this could be useful when you’re just trying on an algorithm for csv data so you know kind of simple data these are small data sets but if you’re working domain specifically so i’m working with genomic data you can see that you have all these types of data and you probably unless you work in genomics you’ve never heard of these file types other than csv or tsv but the important aspect is when you are doing any sort of machine learning cloud even more.
You want to have appropriate sample data whether you create it find it save it or whether you get it from one of the cloud vendors now this is the link to that now a trend in cloud data in general but particularly applicable to machine learning is the cloud vendors are putting some sample data sets and some production size data sets up on their clouds now this is an important trend for cloud ml you know you’re going to hear me say over and over in this talk you want to move away from doing work on your laptop which this is a paradigm shift this is really very new most of the machine learning professionals i work with still do the basic modeling on their laptop.
I’m going to really challenge you to move away from that and you might say well what do you mean well you first want to start with the data that you’re going to do your computer your machine learning on and if that data is already in vendor cloud then you’re going to start by working in that cloud now what do i mean by this there are across all the major cloud vendors both data sets and in many cases data set search i personally think that google’s is the most useful.
So i tend to use that so what does that look like so this is out of google research and i’m just going to click you know the the pre-populated search here um coronavirus or covet 19. and you can see that we have a 100 over 100 data sets found and we have all kinds of filters that we can look at we have usage rights we have we can filter it to free data sets if we want to and then we can explore those data sets the quality of the data is extremely important in the effectiveness of a machine learning model and you can iterate over that faster in the cloud if the data is already in the cloud a pattern for cloud ml is to avoid uploading and downloading data up into the cloud and down from the cloud onto your laptop again you just want to get away from working on your laptop in general.
Now in addition to this the vendors have data sets that are permissioned in my case for public health so uh health researchers who have been trained in compliance law in the united states since hipaa can have access for example to genomic data sets and this is a really important pattern when you’re working with cloud ml [Music] so talking about your dev environment you know i’ve been introducing the idea but you know the point is you’re going to be implementing your cloud machine learning model on the cloud so it’s a very important pattern to build your dev machine on the cloud of course that necessitates you have a very fast internet connection.
I realize is not always possible so there could be an exception to that but if you have a fast internet connection you really want to use the cloud as your laptop and i consider this a huge time saver i have answered hundreds of questions from people trying to set up a dev environment installing our libraries installing gpu drivers installing all kinds of stuff and having a lot of difficulty taking hours or even days to set up a dev environment and then if it’s a team when the next dev comes in taking that same time over again i really encourage you to take a look at what’s available up on the vendor clouds so when you’re setting up your dev environment you can set up your virtual machine from an image you can use a jupiter notebook or you can use a machine learning docker container image.