Thursday, January 29, 2015

My new homesite and blog

Hey all.

I have make my new homesite - http://ankushchadda.in. This is hosted as Github pages.

Next I have also made a blog at http://blog.ankushchadda.in.

Both of these themes are taken from startbootstrap.com. Awesome free templates!!..

I will be publishing my future blogs there now. Will also move my posts to the new blog, but thats a work ill do gradually.

Thanks,
Ankush

Monday, January 12, 2015

PHP application with Mysql database to a Django app with Postgres Database

Recently I had a chance to work on a project, where a PHP app had to be converted to a Django one.

The work seemed trivial, but it brought a few difficult technicalities. It had a Mysql DB, with few tables. Overall, I had to  -


  • Convert them into Django models
  • Convert the app into Django
  • Setup the app on Heroku using Postgres

  1. Converting the models to Django is easy. Django provides an inspectdb command. It looks at the current model structures, inspects it and provides django model accordingly.
  2. This is a nice feature. It gave me a push forward to begin the app development. However, this feature lacked some points that I would like to point out - 
    1. In Mysql, we have a set type . This is basically similar to choices in Django models. I had to manually do that. 
    2. The foreign key constraint is set to id by default. The command could check this and set the field in model declaration using to_field parameter.
    3. The DB views had to created manually. 
  3. For changing Mysql Data to Heroku, https://devcenter.heroku.com/articles/heroku-mysql is a good resource.
As always, heroku has to be provided with a S3 access to uploading of static and media files. But thats a different matter.

Thursday, June 26, 2014

Musings on Openlibrary and my first get-to-know with Solr-1.4.1

OpenLibrary is a ambitious project which runs at openlibrary.org. As the name says, its a open(sourced) project. The code lies on github at https://github.com/internetarchive/openlibrary/.

Now the developers working on the project are extremely talented. The number of technologies and the way they have implemented them is awesome. Users can search a book( ofcourse ) , read it online ( using the beautiful Bookreader client side app) and also borrow a book ( havent used this feature ).

One of my clients , wanted me to implement the OpenLibrary project into their website. They already had some part working. BookReader was working , but the feature of searching inside a book , wasnt.

Openlibrary uses solr as its search engine. It is the most powerful search backend , as its said. Lack of previous developer made a big issue and moreover there was not much documentation present for my task.

I realised from the scripts, that solr 1.4.1 is to be used.  After reading more code, from BookReader I realised, that when we searched , it made a call to solr similar to -

<server>:8984/solr/inside/select?rows=1&wt=json&fl=ia,body_length,page_count&hl=true&hl.fl=body&hl.fragsize=0&hl.maxAnalyzedChars=-1&hl.usePhraseHighlighter=true&hl.simple.pre={{{&hl.simple.post=}}}&q.op=AND&q=ia%3A<id_of_opened_book>

It makes a similar second call with q as ia%3A<id_of_opened_book>%20AND%<searched_query>

In this second call, we get the highlighted results. Now these results are arranged in json. The next task is to locate and highlight the queried words on the ebook . For this we have a xml file of an OCR. In this case, we used abby reader. Queried words are located using the ocr xml file and highlighted on ebook.

Now the only thing remains is to get solr working for full text search. For this Openlibrary, makes a call to a php file called,  abby_to_text.php, which basically reads the OCR file and extracts paragraphs from it. This gets saved into solr.

To save into solr, we make a xml with fields of atleast the required ones, as , mentioned in schema.xml.
The schema I am using is at - https://github.com/internetarchive/openlibrary/blob/master/conf/solr-biblio/inside/conf/schema.xml.
The required fields are -
 <field name="ia" type="string" required="true" />
   <field name="body" type="textgen" required="true" compressed="true" />
   <field name="body_length" type="int" required="true" />
   <field name="page_count" type="int" indexed="true" required="true" />
Here ia is the book id, and body is the text of the book.
Also, you need to commit the results so that you can immediately see in solr admin.

The imp thing here is , that this schema is of inside core. This was also new thing.
More problems came because of solr old version. 1.4.1 is more than 4 years old.

But anyways. It was a good learning.




COnverting MarkDown to ReST format

Generally we make Readme on Github in markdown, but while making ReadtheDocs or Pypi pachage , we need rst docs.

For this comes in handy Pandoc .

Just run the command

pandoc --from=markdown --to=rst --output=install.rst install.md

and rst is ready. Awesome!!.

Friday, April 25, 2014

Continuous Integration and Deployment using Bamboo and AWS Elastic Beanstalk

Walk-through of setting up Bamboo as CI and CD  

Bamboo is a popular Atlassion product . Lets go setup bamboo and discuss what steps did I do.
  • Install Bamboo on an EC2 instance
    • Configure to run on Port 80 instead on its default.
    • Make sure system has enough memory, I am using a m1.small instance. 
    • Bamboo has a startup script, use that , and make sure the permission thing.:P
  • For CI - 
    • Checkout the code
      • Used post push hook to automate the build plan on bamboo
    • Install dependencies
      • Remember to clean cache and remove the node_modules before installing
    • Run tests
      • Used Bamboo-mocha plugin for that. Ample doc is provided for that
    • Thats it !!
  • For CD -
    • Setup the deployment Server -
      • We are using Amazon Beanstalk,, our app being a Node.JS one.
    • The deployment process is tricky. Manually we have to initialize the repo and feed in a lot of details. But to do it automatically 
      • Initialize the repository with the AWSDevTools-RepositorySetup.sh script. It will add git aliases . We will now have git aws.push command
      • The deploy script searches for a file named aws_credentials_file, in the Home folder of the user in .elasticbeanstalk dir. So one task is to copy a file in home folder during each deployment.
    • Rest is simple.

This blog also has a lot of important details that helped me - http://blog.pedago.com/2014/02/18/build-and-deploy-with-grunt-bamboo-and-elastic-beanstalk/

Next step to include Code Coverage .. Will mention it in next blog post.

Wednesday, January 29, 2014

Python? , But why Python?

Well I am using python since last 3 years . It has been my main development language, other than javascript.

Often, someone comes around and asks , Why Python? I am usually left with my personal programming tasks where python comes in very handy when compared to other languages I have encountered, like  PHP or Java.

Well , here is a post for answering this question exactly. Check it here.

Here are the points -

  • Efficient - Has generators 
  • Fast
  • Broad use
  • Not just a language, has a number of implementations
  • Easy

Cheers!!

Yet another post on Redis

While working for a project , we used Redis as queue, using python-rq.  Running a redis-cli , I used the following commands -


  • keys *
  • type <key name>
  • and then according to the type , hash,list I would query the data
Some things were quite easy to understand
  • rq:workers
  • rq:queue:failed
  • rq:queue:default
  • and a success one as well
But apart from these, there were several entries - with name rq:job:<job_id>. After much reading, I found the internal working at http://python-rq.org/contrib/.

It says whenever a function call gets enqueued - 
  • Pushes the job's ids into queue , in my case the default
  • adds a hash objects of the job instance
So, when dequeue happens - 
  • Pops jobid from queue
  • Fetches Job data 
  • Executes the function and saves result has a hash key if success
  • else saves in failed queue with stack trace
All of this is given on Python-rq site.

There are two kinds of error I saw -
  • RuntimeError: maximum recursion depth exceeded while calling a Python object - This happened at queue.py of python-rq module, where I think, it was caused when control crossed max recursive limit, when it didnt find the jobs hashes, as discussed above in dequeue
  • Socket closed on remote end - The server closes client connection after 300s, in my case I didnt want to do them, so. let it be on forever by changing in /etc/redis/redis.conf , timeout value to 0
Go Redis!!