Shreyes Joshi

Aspiring engineer.

Excited to chart the unknown.


About Me

About Me

Hey, my name is Shreyes and I am a rising Senior majoring in Electrical Engineering at Princeton University, with minors in Computer Science, Statistics/Machine Learning, and Finance. I am primarily interested in building technologies, focusing on mathematical models related to trading, full-stack software engineering, hardware incorporating innovative electronic and mechanical components, and complex system design, all facilitated through efficient programmable abstractions.

Whether it be designing hands on circuits, fine-tuned algorithms, or pursuing entrepreneurship, I am motivated to implement available resources to make products with a lasting positive impact on society. Moreover, I am driven to cultivate the most diverse tropical garden in Northern New Jersey! But honestly, I'm serious about that last point, along with several other hobby-related goals including choreographing hip-hop and bhangra dances, playing soccer, and designing custom wooden boomerangs.

At the end of the day, I am truly passionate about my work and excited to learn new skills. If you have a moment, please check out some of my past work and projects in my portfolio below. Feel free to reach out and connect with me here!

Projects

 
Software/Machine Learning
Electrical Engineering
Hobbies

Software Engineering and Machine Learning Projects

Stock Price Trajectory Classification Independent Research:

Sept 2015. - June 2016

This research focused on generating a unique stock price forecasting model through machine learning using fundamental data that is publically available to all types of investors. Unlike hedge funds, who have exclusive access to premier and costly training data acquired through financial service providers, the common investor only has access to basic public information including price, volume, and options data. Using public data as a feature set, this research studies the effectiveness of training 5 machine learning models using w days of financial history to predict how a stock’s price will rise or fall k days away within a certain target radius r.

The study primarily analyzed 3 stocks – Apple, Disney, and Nike, for their market scale and diversified representation of different sectors of the global market. Using numerous experimental methods, analysis, and thorough customized software modules programmed in Python, the model described above was successfully formulated. Moreover, the precision score accuracies for the experimental parameters (w, k, r) were optimized for the Support Vector Machine (SVM) model.

Since the framework of this research was based in Python, and as a result the “scikit-learn”, “pandas” and “numpy” were used in conjunction to implement five different ML algorithms: Logistic Regression, Naïve-Bayes Gaussian, K Neighbors Classifiers, the Decision Tree Classifier, and the Support Vector Machine (SVM). Since this research deals with a binary classification problem into the classes {1, -1}, for upward or downward movement, all five of these algorithms were trained based on the model detailed above. Each model’s performance was then evaluated on a corresponding testing data set, where the results were quantified with a precision score in the range of 0 to 1 that demonstrated the accuracy of classifying particular examples correctly. The study’s raw data was primarily aggregated from a number of Wharton databases, which was heavily preprocessed, normalized, and consolidated for specific features that needed such wrangling to achieve more sensible results.

In the end, demonstrated that an optimized SVM classifier using my model can correctly predict a stock’s directional movement on the initial collection (Apple, Disney, and Nike), as well as on a bundle of 50 stocks with ~72% accuracy on average. Concluded that there are clear correlations between historical financial data that can be used to predict a stock’s directional movement, but there is not enough confidence in these results to actually rely on such a model to manage one’s own investments, without using significantly richer training data to increase accuracy and reduce noise in the results.

Technologies Used: Python, ML Methods, Classification, Big Data Aggregation, Stock and Financial Theory, Statistics

The figures shown above examine Apple. The top most figure shows 3D surface plots, parameterized against (w, k), with the z-axis depicting precision score accuracies for the top three most effective classifiers. The second figure details Apple’s un-optimized SVM training performance, while the third figure demonstrates the optimized SVM testing performance. Here, there are limited higher accuracies, albeit the effect of noise on the results is more prevalent with some overfitting through using GridSearch in the optimization and the lack of higher resolution data.

The top figure shows the images included in the Yale B dataset as a whole. The second and third images detail the varying performance between classifying four men and four women’s faces worth of data aggregated together, with and without sunglasses. The second image shows the training data with testing validation performing well, but in figure three when the eyes of each person are blocked, the testing performance drops significantly. This makes sense because eyes contain much unique information about one’s face. Lastly, figures four and five show how a varying “C” factor for an SVM can actually improve the accuracy of the classifier moving from small C’s to high C’s in this case. This is just one of many examples of SVM optimization incorporated in this project.

Facial Recognition Image Processing:

Feb - May 2016

This project aims to analyze the YALE B facial dataset to understand how well a Support Vector Machine (SVM) can classify facial images. Within this dataset, there are frontal face images of 38 participants, with roughly 60 images per subject, where the total number of images is 2,414. Since the dimensions of each image is 192 x 168 pixels, there are a total of 32,256 column vectors stored in a matrix per image, with a corresponding label for each of the 2,414 images.

Since each image has a dimension of 32,356, it would be computationally rigorous to process these raw images. Thus, a more efficient method of principal component analysis was chosen, which involves projecting the raw image to a smaller one that keeps only “d” of the most important component vectors of the original image to preserve the most valuable information in the least space. These “d” components have the largest variances in the original image. Through such projections, the quality of the image is slightly reduced, but the gains made in computational space conservation exceed these reductions significantly.

After preprocessing the projected images and normalizing, a variety of SVM’s were constructed and tested for their training and testing performance on subsets of the data, while varying certain parameters. For example, SVM’s were tested by varying the number of “d” components used to project upon and altering the type of classifier (linear, rbf, poly, etc).

More interestingly, however, tests for binary classification on identifying men’s and women’s faces using “d” dimensional PCA projections were compared, with and without sunglasses, as shown to the left. Also, the method of implementing PCA to reduce image size was compared against simply subsampling the images. Numerous properties of SVM’s were discovered as a result of these tests, where certain kernels perform well for different tasks, while other SVM parameters like the gamma and C factors objectively increased accuracies. Further study is being conducted to determine more efficient ways to correctly classify images while further reducing the number of components used in the PCA images to improve efficiency.

Technologies Used: Python, Image Processing Theory, Support Vector Machine Optimization, Big Data Processing, Principal Component Analysis

US Patent Trade Office Marketing Automation Software

June - Aug. 2014

While interning at the Entrepreneurs Roundtable Accelerator in the summer of June – August 2014, I worked primarily with the legal startup Plainlegal to develop a marketing automation software that generated over 1 million leads to market their product. Plainlegal offers lawyers an automated method of filing trademark applications to the United States Patent Trade Office (USPTO) in the form of a compact software, which was to be advertised to all of the trademark lawyers in the United States. In order to do so, I formulated a scraping tool in Python that directly accessed the USPTO’s public directory to retrieve the background data on all of the trademark lawyers who have ever filed with the United States. There were millions of XML entries to be parsed through, where each entry had multiple aliases of lawyer data.

To make sense of the information, I crafted complex normalization and text processing methods to refine the data and store it in PostgreSQL/SQLite databases, from which data on all lawyers in the United States can be queried at ease. This information was pivotal in marketing Plainlegal’s product and led to much business and revenue generation for the company. Once the scraping routine was thoroughly tested and made robust, I pushed the code to Amazon EC2 servers in the cloud where the scripts would run continuously to add new lawyer data released by the USPTO in real time, while performing a week long cycle to collect gigabytes of historical data.

Technologies Used: Python, Amazon EC2, PostgreSQL, SQLite, XML Data Scraping and Normalization

Electrical Engineering and Embedded Systems Projects

Car Lab

Feb. 2016 - May 2016

The “Car Lab” is the core Electrical Engineering lab requirement at Princeton University, whose purpose involves students building a car that achieves two main goals –

1. The car must move autonomously at a minimum speed of 4 ft/s, where the car would be exposed to either flat, uphill or downhill driving conditions (speed control).

2. The car must follow a black tape track without any supplementary external control.

The first challenge involved designing hardware components to transfer adequate power to various parts of the car, as well as additional modules to handle signals from the Hall effect sensor and actuate upon this input through an appropriate PWM (Pulse Width Modulation) response that would regulate the motor to maintain the desired equilibrium speed. The exact output of the PWM responses were controlled by a finely tuned Proportional, Integral, and Derivative (PID) feedback algorithm. During this process, the robust integration between the software written in C on a PSoC microcontroller and the various hardware components established a deep, intricate system that would accomplish the desired goal.

The second challenge was to build a new system on top of the initial speed control framework that would enable the car to follow a black tape track autonomously as well. The black tape track was composed of two intersecting tracks and the car has to stay on the same track at a constant speed of at least 4 ft/s, for two full laps, to complete this challenge. The main components of the design solution involved the creation of a Sync Board to interface with an external black and white camera, a comparator chip, and new PSoC software/module design to fulfill the upgraded PID-hardware interfacing. A brief video demo and pictures of the car are shown to the right.

Technologies Used: Circuit Design and Building, Digital Logic, Signal Processing, C Programming on PSoC, PID Control Algorithm Design, System Design

BB-8

June - Aug. 2014

The BB-8 Project was the succession of the Car Lab project shown above, where students were tasked with the full design, build, and testing of a unique electro-mechanical device. My partner and I set out to built a replica of BB-8 from Star Wars, a four-wheeled robot that can balance on top of any ball. The sensor board used in the model, a 9 degrees of freedom inertial motion tracker, senses the current balancing state of the robot. It passes that information to the Arduino micro-controller, the brains of the operations, which computes the appropriate parameters to actuate our robot using the motor controller board, which controls four independent gearbox motors.

The full chassis of BB-8 was designed via 3D modelling on AutoDesk inventor, from which each of the individually designed components (nearly 50!), were cut out of acrylic using a laser cutter. Our design is unique because each the four motors, which turn omni-wheels, have adjustable angle adjustment, which allows the robot to get the correct angle to balance on top of any ball. The main batteries are stored above the primary chassis, with the core electronics housed above in a tower like structure.

A two dimensional PID loop was implemented to control the four motors such that as soon as the robot starts falling in any direction off the side of the ball, the sensor board would pick up on those readings, which would be processed in our PID function loop. The output of this function is in the form of Pulse Width Modulation, which is sent to the motors to rebalance BB-8 on top of the ball. A few pictures of the build process are shown here, with a balancing video coming soon!

Technologies Used: Inventor CAD Design, Laser Cutting, Mechanical and Electronic Design, Robotics, C Programming on Arduino Mega, PID Control Algorithm Tuning

ElectroCardioGraph (ECG):

Sept 2014. - Jan. 2015

As part of a final project in a Princeton circuits course, I designed an ElectroCardioGraph (ECG) circuit, which measures one’s heart rate. This popular and non-invasive device functions by using electrical probes that can be attached to a human patient to trace electrical signals generated by the polarization and depolarization of cardiac tissue, which is then translated into a waveform. At a high level, an ECG’s probes capture the input to the general ECG circuit, which consists of an instrumentation amplifier, a variable-gain amplifier, a band-pass filter, and a notch filter. From the notch filter, an output signal is produced which contains the desired data of one’s heart rate in terms of valuable parameters such as frequency and voltage amplitude. This output signal was manifested through a custom LED beating heart using an ArduinoUno in C.

Technologies Used: Circuit Design, Amplifiers, Filters, Impulse Response Analysis, C, and ArduinoUno

...

Hobbies

Tropical Gardening in New Jersey:

Feb. 2008. - Present

With a fervent interest in cultivating tropical plants, fruits, and vegetables, I did not let the frigid New Jersey climate stand in the way of my objective. I have experimented with several types of plants, such as bananas, saffron, oranges, and lemons, growing in one of nearly a hundred pots in my home. Additionally, these I have built small greenhouses to raise such tropical plants in the warmth they need. Moreover, I also grow a vast number of general vegetables like peppers, tomatoes, and cucumbers, in an ever long battle with voracious crop eaters like neighborhood deer and groundhogs. I have conducted different research projects with many species of plants, and aim to one day grow a full-fledged ayurvedic medicinal herb farm in the desert southwest. Such herbs can alleviate common sicknesses without having to take prescribed medicines.

Technologies Used: Grafting, Sowing, Farm Layout, Construction, Chemistry, Meteorology

...

Projects

 
Software/Machine Learning
Electrical Engineering
Hobbies

Software Engineering and Machine Learning Projects

Stock Price Trajectory Classification Independent Research:

Sept 2015. - June 2016

This research focused on generating a unique stock price forecasting model through machine learning using fundamental data that is publically available to all types of investors. Unlike hedge funds, who have exclusive access to premier and costly training data acquired through financial service providers, the common investor only has access to basic public information including price, volume, and options data. Using public data as a feature set, this research studies the effectiveness of training 5 machine learning models using w days of financial history to predict how a stock’s price will rise or fall k days away within a certain target radius r.

The study primarily analyzed 3 stocks – Apple, Disney, and Nike, for their market scale and diversified representation of different sectors of the global market. Using numerous experimental methods, analysis, and thorough customized software modules programmed in Python, the model described above was successfully formulated. Moreover, the precision score accuracies for the experimental parameters (w, k, r) were optimized for the Support Vector Machine (SVM) model.

Since the framework of this research was based in Python, and as a result the “scikit-learn”, “pandas” and “numpy” were used in conjunction to implement five different ML algorithms: Logistic Regression, Naïve-Bayes Gaussian, K Neighbors Classifiers, the Decision Tree Classifier, and the Support Vector Machine (SVM). Since this research deals with a binary classification problem into the classes {1, -1}, for upward or downward movement, all five of these algorithms were trained based on the model detailed above. Each model’s performance was then evaluated on a corresponding testing data set, where the results were quantified with a precision score in the range of 0 to 1 that demonstrated the accuracy of classifying particular examples correctly. The study’s raw data was primarily aggregated from a number of Wharton databases, which was heavily preprocessed, normalized, and consolidated for specific features that needed such wrangling to achieve more sensible results.

In the end, demonstrated that an optimized SVM classifier using my model can correctly predict a stock’s directional movement on the initial collection (Apple, Disney, and Nike), as well as on a bundle of 50 stocks with ~72% accuracy on average. Concluded that there are clear correlations between historical financial data that can be used to predict a stock’s directional movement, but there is not enough confidence in these results to actually rely on such a model to manage one’s own investments, without using significantly richer training data to increase accuracy and reduce noise in the results.

Technologies Used: Python, ML Methods, Classification, Big Data Aggregation, Stock and Financial Theory, Statistics


The figures shown above examine Apple. The top most figure displays 3D surface plots, parameterized against (w, k), with the z-axis depicting precision score accuracies for the three best classifiers. The second figure details Apple’s un-optimized SVM training performance, while the third figure demonstrates the optimized SVM testing performance. Here, there are higher accuracies, albeit the effect of noise on the results is more prevalent with some overfitting through using GridSearch in the optimization and the lack of higher resolution data.

Facial Recognition Image Processing:

Feb - May 2016

This project aims to analyze the YALE B facial dataset to understand how well a Support Vector Machine (SVM) can classify facial images. Within this dataset, there are frontal face images of 38 participants, with roughly 60 images per subject, where the total number of images is 2,414. Since the dimensions of each image is 192 x 168 pixels, there are a total of 32,256 column vectors stored in a matrix per image, with a corresponding label for each of the 2,414 images.

Since each image has a dimension of 32,356, it would be computationally rigorous to process these raw images. Thus, a more efficient method of principal component analysis was chosen, which involves projecting the raw image to a smaller one that keeps only “d” of the most important component vectors of the original image to preserve the most valuable information in the least space. These “d” components have the largest variances in the original image. Through such projections, the quality of the image is slightly reduced, but the gains made in computational space conservation exceed these reductions significantly.

After preprocessing the projected images and normalizing, a variety of SVM’s were constructed and tested for their training and testing performance on subsets of the data, while varying certain parameters. For example, SVM’s were tested by varying the number of “d” components used to project upon and altering the type of classifier (linear, rbf, poly, etc).

More interestingly, however, tests for binary classification on identifying men’s and women’s faces using “d” dimensional PCA projections were compared, with and without sunglasses, as shown below. Also, the method of implementing PCA to reduce image size was compared against simply subsampling the images. Numerous properties of SVM’s were discovered as a result of these tests, where certain kernels perform well for different tasks, while other SVM parameters like the gamma and C factors objectively increased accuracies. Further study is being conducted to determine more efficient ways to correctly classify images while further reducing the number of components used in the PCA images to improve efficiency.

Technologies Used: Python, Image Processing Theory, Support Vector Machine Optimization, Big Data Processing, Principal Component Analysis

The top figure shows the images included in the Yale B dataset as a whole. The second and third images detail the varying performance between classifying four men and four women’s faces worth of data aggregated together, with and without sunglasses. The second image shows the training data with testing validation performing well, but in figure three when the eyes of each person are blocked, the testing performance drops significantly. This makes sense because eyes contain much unique information about one’s face. Lastly, figures four and five show how a varying “C” factor for an SVM can actually improve the accuracy of the classifier moving from small C’s to high C’s in this case. This is just one of many examples of SVM optimization incorporated in this project.

US Patent Trade Office Marketing Automation Software

June - Aug. 2014

While interning at the Entrepreneurs Roundtable Accelerator in the summer of June – August 2014, I worked primarily with the legal startup Plainlegal to develop a marketing automation software that generated over 1 million leads to market their product. Plainlegal offers lawyers an automated method of filing trademark applications to the United States Patent Trade Office (USPTO) in the form of a compact software, which was to be advertised to all of the trademark lawyers in the United States. In order to do so, I formulated a scraping tool in Python that directly accessed the USPTO’s public directory to retrieve the background data on all of the trademark lawyers who have ever filed with the United States. There were millions of XML entries to be parsed through, where each entry had multiple aliases of lawyer data.

To make sense of the information, I crafted complex normalization and text processing methods to refine the data and store it in PostgreSQL/SQLite databases, from which data on all lawyers in the United States can be queried at ease. This information was pivotal in marketing Plainlegal’s product and led to much business and revenue generation for the company. Once the scraping routine was thoroughly tested and made robust, I pushed the code to Amazon EC2 servers in the cloud where the scripts would run continuously to add new lawyer data released by the USPTO in real time, while performing a week long cycle to collect gigabytes of historical data.

Technologies Used: Python, Amazon EC2, PostgreSQL, SQLite, XML Data Scraping and Normalization

Electrical Engineering and Embedded Systems Projects

Car Lab

Feb. 2016 - May 2016

The “Car Lab” is the core Electrical Engineering lab requirement at Princeton University, whose purpose involves students building a car that achieves two main goals –

1. The car must move autonomously at a minimum speed of 4 ft/s, where the car would be exposed to either flat, uphill or downhill driving conditions (speed control).

2. The car must follow a black tape track without any supplementary external control.

The first challenge involved designing hardware components to transfer adequate power to various parts of the car, as well as additional modules to handle signals from the Hall effect sensor and actuate upon this input through an appropriate PWM (Pulse Width Modulation) response that would regulate the motor to maintain the desired equilibrium speed. The exact output of the PWM responses were controlled by a finely tuned Proportional, Integral, and Derivative (PID) feedback algorithm. During this process, the robust integration between the software written in C on a PSoC microcontroller and the various hardware components established a deep, intricate system that would accomplish the desired goal.

The second challenge was to build a new system on top of the initial speed control framework that would enable the car to follow a black tape track autonomously as well. The black tape track was composed of two intersecting tracks and the car has to stay on the same track at a constant speed of at least 4 ft/s, for two full laps, to complete this challenge. The main components of the design solution involved the creation of a Sync Board to interface with an external black and white camera, a comparator chip, and new PSoC software/module design to fulfill the upgraded PID-hardware interfacing. A brief video demo and pictures of the car are shown to the right.

Technologies Used: Circuit Design and Building, Digital Logic, Signal Processing, C Programming on PSoC, PID Control Algorithm Design, System Design

BB-8

June - Aug. 2014

The BB-8 Project was the succession of the Car Lab project shown above, where students were tasked with the full design, build, and testing of a unique electro-mechanical device. My partner and I set out to built a replica of BB-8 from Star Wars, a four-wheeled robot that can balance on top of any ball. The sensor board used in the model, a 9 degrees of freedom inertial motion tracker, senses the current balancing state of the robot. It passes that information to the Arduino micro-controller, the brains of the operations, which computes the appropriate parameters to actuate our robot using the motor controller board, which controls four independent gearbox motors.

The full chassis of BB-8 was designed via 3D modelling on AutoDesk inventor, from which each of the individually designed components (nearly 50!), were cut out of acrylic using a laser cutter. Our design is unique because each the four motors, which turn omni-wheels, have adjustable angle adjustment, which allows the robot to get the correct angle to balance on top of any ball. The main batteries are stored above the primary chassis, with the core electronics housed above in a tower like structure.

A two dimensional PID loop was implemented to control the four motors such that as soon as the robot starts falling in any direction off the side of the ball, the sensor board would pick up on those readings, which would be processed in our PID function loop. The output of this function is in the form of Pulse Width Modulation, which is sent to the motors to rebalance BB-8 on top of the ball. A few pictures of the build process are shown here, with a balancing video coming soon!

Technologies Used: Inventor CAD Design, Laser Cutting, Mechanical and Electronic Design, Robotics, C Programming on Arduino Mega, PID Control Algorithm Tuning

ElectroCardioGraph (ECG):

Sept 2014. - Jan. 2015

As part of a final project in a Princeton circuits course, I designed an ElectroCardioGraph (ECG) circuit, which measures one’s heart rate. This popular and non-invasive device functions by using electrical probes that can be attached to a human patient to trace electrical signals generated by the polarization and depolarization of cardiac tissue, which is then translated into a waveform. At a high level, an ECG’s probes capture the input to the general ECG circuit, which consists of an instrumentation amplifier, a variable-gain amplifier, a band-pass filter, and a notch filter. From the notch filter, an output signal is produced which contains the desired data of one’s heart rate in terms of valuable parameters such as frequency and voltage amplitude. This output signal was manifested through a custom LED beating heart using an ArduinoUno in C.

Technologies Used: Circuit Design, Amplifiers, Filters, Impulse Response Analysis, C, and ArduinoUno

Hobbies

Tropical Gardening in New Jersey:

Feb. 2008. - Present

With a fervent interest in cultivating tropical plants, fruits, and vegetables, I did not let the frigid New Jersey climate stand in the way of my objective. I have experimented with several types of plants, such as bananas, saffron, oranges, and lemons, growing in one of nearly a hundred pots in my home. Additionally, these I have built small greenhouses to raise such tropical plants in the warmth they need. Moreover, I also grow a vast number of general vegetables like peppers, tomatoes, and cucumbers, in an ever long battle with voracious crop eaters like neighborhood deer and groundhogs. I have conducted different research projects with many species of plants, and aim to one day grow a full-fledged ayurvedic medicinal herb farm in the desert southwest. Such herbs can alleviate common sicknesses without having to take prescribed medicines.

Technologies Used: Grafting, Sowing, Farm Layout, Construction, Chemistry, Meteorology

Get In Touch

Get In Touch

Feel free to shoot me an email for a copy of my resume here, to provide site suggestions, or to just say hello!

shjoshi@princeton.edu