Monday, June 6, 2016

6/6

Today, I've continued working on web scraping. I discovered that the github I was going to reference was 4 years old and didn't work on AllRecipes anymore. I'm going to keep it as a reference for organization, however. I also decided to use AllRecipes.com instead of Epicurious, since Epicurious had too many ads, unrelated links, and JavaScript that were interfering with navigation.

I've been looking at tutorials on using Jsoup with Java. The most helpful was this one which allowed me to successfully grab the name and URL for each category on AllRecipes. However, I got stuck when trying to navigate within each category. I posted a comment on his page and will talk to my brother tonight about advice on how to proceed.

I'm sure problems will arise this week with formatting, splitting ingredients, getting the proper names, etc. but after spending a few days on it, I'm sure things will come together. I'm also starting to think about how I'll let the user know what recipes they can choose from. Maybe later in the process I'll start to integrate the app features more so I can list all the available recipes.

Once I successfully scrape the recipes, I think more progress will quickly come.

1 comment:

  1. At one point, weren't you planning to take advantage of the hRecipe (or h-recipe) microformat to help with the scraping? Is that proving to be useful, or is it not implemented consistently enough for you to extract trustworthy data?

    ReplyDelete