dinsdag 30 augustus 2016

My attempt to the AWS DevOps Engineer professional exam sample questions.

This is just a blogpost to try out the sample questions of the AWS Devops Engineer professional exam.  So keep in mind that these are by no means official answers they are just my attempt to solving the example questions.  I did a similar attempt for the Solutions Architect professional exam.

Question 1 automated data backup solution:

  - Answer C: ec2-create-snapshot API results in a snapshot from your EBS volume which is stored in S3 (=distributed data store without single point of failure).  Use the tags to manage metadata and be able to cleanup old snapshots to limit costs.
 -  Answer A is not valid because ec2-create-volume creates an empty volume or one initiated from a snapshot.  Even if copy backup data means to this volume than it would be sub optimal as you will have worse retention compared to answer C and you will have higher costs (every volume you create will take up full space whereas S3 snapshot will be incremental)
- Answer B is not valid because recovering your data from Glacier won't be possible within 1 hour
- Answer D is not valid because ec2-copy-snapshot can only operate on snapshots not on EBS volumes

Question 2: Going from M3 instances to C3 instances when using CloudFormation and AutoScaling Groups

 - Answer D is correct:
 -  Answer A is not sufficient as the Auto Scaling group won't automatically replace your existing instances.
  - Answer B & C are not valid since you cannot update a launch configuration.  As stated in the documentation.

Question 3:  CloudFormation for complex systems 

 - Answer B is my preferred answer as multiple separate templates are easier to maintain and allow re-use.
 - Answer A is not good because maintaining a single template does not scale (it won't allow re-use)
 - Answer C is not good as orchestrating the process from an EC2 instance introduces a single point of failure.  It is also not cost efficient as it will require additional work to implement and a running instance.
 - Answer D is not good because you wanted to version-control your infrastructure which encompasses networking and thus VPC

Question 4: Automated deployment - reduce launch time

- Answer B is correct as it fulfills the requirements
- Answer A is incorrect because of the timing requirements it would take more than 3 minutes given the timings of the operations
- Answer C is incorrect because of timing (artifacts 4 min + deploy app code 1 min) > 3 min
- Answer D is incorrect because you still need to perform all the steps and thus require > 3 minutes, also the polling is not needed as User Data can be used to initialize the EC2

Question 5: I/O load performance test

 - Answer B is correct.  This operation is needed to avoid the first touch penalty and to make sure you check the performance of a warm Volume (at least if the volume is restored from snapshot, nowadays you don't need to prewarm new empty volumes).
- Answer A is incorrect because you would use the block size which is applicable for your application and won't change it upon deployment
- Answer C is incorrect as you just use the volumes to test so no need to back them up, having a backup also does not influence performance
- Answer D is incorrect as encrypted volumes won't boost performance
- Answer E is incorrect as creating a snapshot does not read every block in the volume but only touches the blocks that have data (or changed data if the volume itself was created from a snapshot)

Question 6: social media marketing application

- Answer B is a possible way to do this.  DynamoDB is a persistent store that stores data in multiple AZs.
- Answer A is not correct because Kinesis is for streaming data, it won't allow you to keep historical data as it only can keep data records for 1-7 days.
- Answer C is not correct because Glacier is for archiving data that you rarely want to retrieve.  Also if you want to publish your data into Redshift using DataPipeline you need to have your data in a source that can be used by Redshift (e.g regular S3 (not Glacier!), DynamoDB, EC2 (via SSH), EMR (via SSH) )
- Answer D is not correct because Amazon CloudWatch is not for analytics.  Also Cloudwatch only retains data for 14 days.

Question 7: bill increase

- Answer C is correct: wording is a bit weird as it looks sub optimal to send an SNS to your application which would then query DynamoDB.  It is good to have the data in Dynamo DB to have it there persistently but it would be more efficient to have a lambda function putting the data into Dynamo DB and that this lambda function also updates the cache with the new entry such that at this time DynamoDB does not need to be queried.
- Answer A is incorrect because S3 bucket life cycles are to change storage type of an object it does not allow you to push a list of objects to another bucket
- Answer B would be feasible but only if you have a very controlled process over the uploads that can be done in your bucket it is not as practical as C.
- Answer D is incorrect because there is no such thing as SQS lifecycles that move objects into S3
- Answer E is incorrect because ElastiCache is a key-value cache service which does not allow you to push files into Amazon S3

Question 8: AWS Elastic Beanstalk + continuous deployment (zero downtime)

- Answer B is correct because the swap of DNS name allows you to rollback almost immediately (ignoring clients that don't implement DNS and TTLs correctly)
- Answer A is incorrect because depending on the deploy time rollback won't be almost immediately
- Answer C is incorrect doesn't have a rollback scenario
- Answer D is incorrect because I don't think you can configure a beanstalk environment to send HTTP 301 response codes.  Also because 301 HTTP response code is a permanent redirect and therefore will cause clients to cache this redirect so rollback will still keep impacted clients on the new environment

Question 9: Log analysis application

- Answer A seems possible as CoudWatch Logs can filter your logs and keep track of login actions
- Answer B is incorrect as it won't be real time
- Answer C is incorrect because it has no auto-scaling
- Answer D is incorrect as it won't be real time
- Answer E is incorrect as you cannot use map reduce job on data that is in your RDS MySQL.  You would need to export the data which won't be real time either

Question 10: orders processing

- Answer B is correct because it will automatically replace an instance if it is stopped/terminated or fails (HW failure).
- Answer E is correct because it allows to detect that the application is no longer running correctly and will stop the instance automatically.
- Answer A is incorrect as a 2nd watchdog instance costs money
- Answer C is incorrect as no need for scaling was expressed and having a minimum of 2 already makes it more expensive
- Answer D is incorrect as you need to pay for the ELB so it is not the most cost-effective way

2 opmerkingen:

  1. Question 7:
    I would go with option B.
    Your choice with option C, to my opinion,is cost more since there is no option to generate SNS notification from object be uploaded to S3 and SNS does not have ability to update DynamoDB data, you will need separate instance to that task or leverage the use of Lambda + CloudTrail + CloudWatch event which are not mentioned in question.

  2. There is actually an option to generate SNS notification when an object is uploaded to S3 (can you explain why you say there isn't?). If you go to the S3 console and go to your bucket you can click 'properties' and then expand 'Events'. There you can configure these. The console text says: "Event Notifications enable you to send alerts or trigger workflows. Notifications can be sent via Amazon Simple Notification Service (SNS) or Amazon Simple Queue Service (SQS) or to a Lambda function (depending on the bucket location)." For type of event you could select 'ObjectCreated(ALL)'. So you don't need any separate instance nor do you need to use lambda in this case. It would be possible to trigger Lambda however you could equally send the SNS notification to your application. However one would need to make sure the application is high available such that SNS notifications are not missed. Or go for SNS-> SQS and to poll an SQS queue to have less coupling. Since these are cheap services I am not to concerned about the price tag as a separate instance is not needed with this approach. I would need more detailed information on the application to be more certain on my answer. Because it is starting to sound like over engineering ;-). If all the files in the bucket are actually placed by the application then all this is not needed and B is a valid option. In that case it would be the best candidate as it follows the KISS principle.