More cleaning of my code. I've nearly finished all documentation and I've added a few more features to make it run smoothly. I changed it so that when you ask for a specific step, if you ask for the next step, it'll give you the step after the number you previously asked for.
I've also had Alexa end the session more frequently to stop the issue of switching between recipes.
I've continued to work on the ingredient recognition, but it's becoming more and more difficult. I'm happy with my method to find the recipe from my ingredient list, but my ingredient list is not up to standard. When I originally made it, I went through hundreds of recipes and processed unnecessary words out of them, but the list still isn't concise enough. I've tried finding a large list of common ingredients on the internet but it's nearly impossible to find. The main issue with my list is that although Zucchini may be listed somewhere, it's probably surrounded by words that make it no longer match the majority of recipes. For example, large zucchini quartered was on my ingredient list. If the recipe called for only Zucchini, it would return that you don't need large zucchini quartered.
I've attached a screenshot of one section of my ingredient list. As you can see, the filler words are widely varied and numerous. Not quite sure how to best get rid of them all, except for brute force. I've done some work on it today, but after 30 minutes of manual deletion, I was about ready to pull my hair out! Any suggestions for this?
The obvious choices:
ReplyDelete* Crowdsourcing: Either put up a web page or something, or find some cash for Mechanical Turk.
* Clustering / machine learning: Try to build some software that will find common words and clusters and do something sensible. Text Classification techniques are out there.
* Natural Language Processing: Try to parse the query and the ingredient lists and extract semantic information that way.
* Lots of false positives: Match each word in a query separately against each word in the target, and return the union of results.
For this project, given the time frame, I'd suggest the last thing.