lunedì 2 maggio 2011

OrientDB - Metodi di scrittura: ODocument e Pojo (Embedding in java) - #2

altre puntate: #1

prelevo un file di 10 mb da questo sito per avere dei dati di test per provare le performace di scrittura di questo db, ovvero prendo una classe la aggiungo ad un document e salvo il tutto.

Il file era troppo piccolo quindi lo copia e incollato su se stesso diverse volte ottenendo circa 720000  righe (adesso il mio file pesa 52 MB circa).

metoto ODocument con la classe ODatabaseDocumentTx

ecco il codice:


package orientdbtest;

import com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx;
import com.orientechnologies.orient.core.metadata.security.OUser;
import com.orientechnologies.orient.core.record.impl.ODocument;
import com.orientechnologies.orient.core.sql.query.OSQLSynchQuery;
import com.orientechnologies.orient.server.OServer;
import com.orientechnologies.orient.server.OServerMain;
import java.io.File;

public class Main {

    public static class GeoIp {

        String from;
        String to;
        Long lat;
        Long lng;
        String country;
        String state;

        public void setFrom(String from) {
            this.from = from;
        }

        public void setTo(String to) {
            this.to = to;
        }

        public void setCountry(String country) {
            this.country = country;
        }

        public void setState(String state) {
            this.state = state;
        }

        public String getFrom() {
            return from;
        }

        public String getTo() {
            return to;
        }

        public String getCountry() {
            return country;
        }

        public String getState() {
            return state;
        }

        public Long getLng() {
            return lng;
        }

        public Long getLat() {
            return lat;
        }

        public void setLng(Long lng) {
            this.lng = lng;
        }

        public void setLat(Long lat) {
            this.lat = lat;
        }
    }

    public static void main(String[] args) {
        try {

            String base = "/home/marco/Scrivania/orientdb/";

            OServer server = OServerMain.create();
            server.startup(new File(base + "/file/conf.xml"));

            ODatabaseDocumentTx db = new ODatabaseDocumentTx("local:" + base + "/db").create();


            try {


                for (String s : db.getClusterNames()) {
                    System.out.println("name: " + s + " - " + db.countClusterElements(s));

                }

                db.begin();


                BigFile geoip_list = new BigFile(base + "/file/data/GeoIPCountryWhois.csv");


                long time = System.currentTimeMillis();


                long c = 0;
                for (String geoip_row : geoip_list) {
                    geoip_row = geoip_row.replaceAll("\"", "");
                    String[] s = geoip_row.split("\\,");
                    GeoIp geoIp = new GeoIp();
                    geoIp.setFrom(s[0]);
                    geoIp.setTo(s[1]);
                    geoIp.setLat(new Long(s[2]));
                    geoIp.setLng(new Long(s[3]));
                    geoIp.setCountry(s[4]);
                    geoIp.setState(s[5]);

                    ODocument doc = new ODocument(db);
                    doc.field("geoIp", geoIp);
                    doc.save();
                    if(c>10000 && c % 10000 == 0)
                        System.out.println(c);
                    c++;
                }



                db.commit();
                System.out.println("tot time: " + (System.currentTimeMillis() - time));
                System.out.println("commit:" + c);


            } catch (Exception e) {
                System.out.println("e:" + e.getMessage() + e.getStackTrace().toString());
                db.rollback();
            } finally {


                for (String s : db.getClusterNames()) {
                    System.out.println("name: " + s + " - " + db.countClusterElements(s));

                }

                db.close();
            }

            server.shutdown();
        } catch (Exception ex) {
            System.out.println("ex:" + ex.getMessage()+ ex.getStackTrace().toString());
        }


    }
}



qui la classe BigFile

lancio il tutto da console per evitare di usare troppa ram dalla ide visto che leggo il file con un iterator.

java -jar "/home/marco/netbeans-project/OrientDbTest/dist/OrientDbTest.jar" 

2011-04-29 05:29:02:098 INFO [OServer] OrientDB Server v1.0rc1-SNAPSHOT is starting up...
2011-04-29 05:29:07:325 INFO [OServerNetworkListener] Listening binary connections on 0.0.0.0:2424
2011-04-29 05:29:12:331 INFO [OServerNetworkListener] Listening http connections on 0.0.0.0:2480
2011-04-29 05:29:12:331 INFO [OServer] OrientDB Server v1.0rc1-SNAPSHOT is active.name: internal - 4
name: index - 0
name: default - 0
name: orole - 3
name: ouser - 3
20000
30000
40000
...
...
...
690000
700000
710000
tot time: 39482
commit:719600
name: internal - 4
name: index - 0
name: default - 719600
name: orole - 3
name: ouser - 3



...non male!!! inserite 719600 righe in 39 secondi tutto embeddato in un jar sul pc client....

se utilizzo il metodo POJO, ovvero OrientDB mappa la classe e salvo direttamente la classe mappata il tutto diventa molto più lento (forse perchè deve ricostruire tutto l'object in fase di salvataggio).

metodo POJO con la class ODatabaseObjectTx


package orientdbtest;

import com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx;
import com.orientechnologies.orient.core.db.object.ODatabaseObjectTx;
import com.orientechnologies.orient.core.annotation.OVersion;
import com.orientechnologies.orient.core.metadata.security.OUser;
import com.orientechnologies.orient.core.record.impl.ODocument;
import com.orientechnologies.orient.core.sql.query.OSQLSynchQuery;
import com.orientechnologies.orient.core.type.ODocumentWrapper;
import com.orientechnologies.orient.server.OServer;
import com.orientechnologies.orient.server.OServerMain;
import javax.persistence.Id;
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;


public class Main {

    public static class GeoIp  {

          @Id
          private Object id;
          @OVersion
          private Object version;


        private String from;
        private String to;
        private Long lat;
        private Long lng;
        private String country;
        private String state;

        public void setFrom(String from) {
            this.from = from;
        }

        public void setTo(String to) {
            this.to = to;
        }

        public void setCountry(String country) {
            this.country = country;
        }

        public void setState(String state) {
            this.state = state;
        }

        public String getFrom() {
            return from;
        }

        public String getTo() {
            return to;
        }

        public String getCountry() {
            return country;
        }

        public String getState() {
            return state;
        }

        public Long getLng() {
            return lng;
        }

        public Long getLat() {
            return lat;
        }

        public void setLng(Long lng) {
            this.lng = lng;
        }

        public void setLat(Long lat) {
            this.lat = lat;
        }
    }

    public static void main(String[] args) {
        try {

            String base = "/home/marco/Scrivania/orientdb/";

            OServer server = OServerMain.create();
            server.startup(new File(base + "/file/conf.xml"));


                ODatabaseObjectTx db = new ODatabaseObjectTx("local:" + base + "/db").create();
                db.getEntityManager().registerEntityClass(GeoIp.class);
                
            try {


                for (String s : db.getClusterNames()) {
                    System.out.println("name: " + s + " - " + db.countClusterElements(s));

                }

                db.begin();


                BigFile geoip_list = new BigFile(base + "/file/data/GeoIPCountryWhois.csv");


                long time = System.currentTimeMillis();


                long c = 0;


                for (String geoip_row : geoip_list) {
                    geoip_row = geoip_row.replaceAll("\"", "");
                    String[] s = geoip_row.split("\\,");
                    GeoIp geoIp = new GeoIp();
                    geoIp.setFrom(s[0]);
                    geoIp.setTo(s[1]);
                    geoIp.setLat(new Long(s[2]));
                    geoIp.setLng(new Long(s[3]));
                    geoIp.setCountry(s[4]);
                    geoIp.setState(s[5]);
                    db.save(geoIp);
                    if(c>10000 && c % 10000 == 0)
                        System.out.println(c);
                    c++;
                }



                db.commit();
                System.out.println("tot time: " + (System.currentTimeMillis() - time));
                System.out.println("commit:" + c);


            } catch (Exception e) {
                System.out.println("e:" + e.getMessage() + e.getStackTrace().toString());
                db.rollback();
            } finally {


                for (String s : db.getClusterNames()) {
                    System.out.println("name: " + s + " - " + db.countClusterElements(s));

                }

                db.close();
            }


            server.shutdown();
        } catch (Exception ex) {
            System.out.println("ex:" + ex.getMessage()+ ex.getStackTrace().toString());
        }


    }
}




qui vengono indicate le velocità (colonna speed) in base al metodologia di salvataggio usato.


altre puntate: #1

2 commenti:

Luca Garulli ha detto...

Ciao,
per andare più veloce due piccole dritte:
1) ricicla i ODocument creandoli una volta sola e dentro il for() ogni volta fai doc.reset()
2) Se elimini anche la classe "GeoIp" e usi un altra classe di document per questo vai ancora più veloce

Lvc@

Marco Berri ha detto...

grazie!